86
1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments [email protected] http://vlsicad.ucsd.edu

1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments [email protected]

Embed Size (px)

Citation preview

Page 1: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

1ISVLSI-2014 invited talk 140710

Toward Holistic Modeling Margining and Tolerance of IC

Variability

Andrew B KahngUCSD CSE and ECE Departments

abkucsdeduhttpvlsicaducsdedu

2ISVLSI-2014 invited talk 140710

IC Variability

bull In manufacturing processbull FEOLbull BEOL

bull During operationbull Voltagebull Temperature

bull Across lifetimebull Agingbull Breakdown

3ISVLSI-2014 invited talk 140710

Challenge Value of TechnologyD

esig

n q

ual

ity

(eg

f

req

uen

cy)

Technology generation

Design with margins

Margin lost benefits of technology

margin Lost benefits Nom

inal

Sc

alin

g

4ISVLSI-2014 invited talk 140710

Solutions Modeling Margining Tolerance

Solutions Modeling Margining Tolerance

BEOL Corner Optimization radic

Process-Aware Vdd Scaling radic

BTI EM-AVS Interactions radic

Overdrive Signoff radic

Min Cost of Resilience radic

bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate

computing hellip

5ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

6ISVLSI-2014 invited talk 140710

BEOL Corner Optimization

bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult

bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)

bull Compromised circuit performance at high Vdd

bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-

critical pathsbull We identify paths which can be safely signed off using tightened

BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan

7ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

Routed design

Timing analysis using conventional BEOL corners (CBC)

ECOusing CBC

violation = 0

done

Conventional Signoff

No

Routed design

Classify timing critical paths

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

This work

NoNo

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 2: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

2ISVLSI-2014 invited talk 140710

IC Variability

bull In manufacturing processbull FEOLbull BEOL

bull During operationbull Voltagebull Temperature

bull Across lifetimebull Agingbull Breakdown

3ISVLSI-2014 invited talk 140710

Challenge Value of TechnologyD

esig

n q

ual

ity

(eg

f

req

uen

cy)

Technology generation

Design with margins

Margin lost benefits of technology

margin Lost benefits Nom

inal

Sc

alin

g

4ISVLSI-2014 invited talk 140710

Solutions Modeling Margining Tolerance

Solutions Modeling Margining Tolerance

BEOL Corner Optimization radic

Process-Aware Vdd Scaling radic

BTI EM-AVS Interactions radic

Overdrive Signoff radic

Min Cost of Resilience radic

bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate

computing hellip

5ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

6ISVLSI-2014 invited talk 140710

BEOL Corner Optimization

bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult

bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)

bull Compromised circuit performance at high Vdd

bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-

critical pathsbull We identify paths which can be safely signed off using tightened

BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan

7ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

Routed design

Timing analysis using conventional BEOL corners (CBC)

ECOusing CBC

violation = 0

done

Conventional Signoff

No

Routed design

Classify timing critical paths

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

This work

NoNo

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 3: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

3ISVLSI-2014 invited talk 140710

Challenge Value of TechnologyD

esig

n q

ual

ity

(eg

f

req

uen

cy)

Technology generation

Design with margins

Margin lost benefits of technology

margin Lost benefits Nom

inal

Sc

alin

g

4ISVLSI-2014 invited talk 140710

Solutions Modeling Margining Tolerance

Solutions Modeling Margining Tolerance

BEOL Corner Optimization radic

Process-Aware Vdd Scaling radic

BTI EM-AVS Interactions radic

Overdrive Signoff radic

Min Cost of Resilience radic

bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate

computing hellip

5ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

6ISVLSI-2014 invited talk 140710

BEOL Corner Optimization

bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult

bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)

bull Compromised circuit performance at high Vdd

bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-

critical pathsbull We identify paths which can be safely signed off using tightened

BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan

7ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

Routed design

Timing analysis using conventional BEOL corners (CBC)

ECOusing CBC

violation = 0

done

Conventional Signoff

No

Routed design

Classify timing critical paths

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

This work

NoNo

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 4: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

4ISVLSI-2014 invited talk 140710

Solutions Modeling Margining Tolerance

Solutions Modeling Margining Tolerance

BEOL Corner Optimization radic

Process-Aware Vdd Scaling radic

BTI EM-AVS Interactions radic

Overdrive Signoff radic

Min Cost of Resilience radic

bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate

computing hellip

5ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

6ISVLSI-2014 invited talk 140710

BEOL Corner Optimization

bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult

bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)

bull Compromised circuit performance at high Vdd

bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-

critical pathsbull We identify paths which can be safely signed off using tightened

BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan

7ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

Routed design

Timing analysis using conventional BEOL corners (CBC)

ECOusing CBC

violation = 0

done

Conventional Signoff

No

Routed design

Classify timing critical paths

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

This work

NoNo

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 5: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

5ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

6ISVLSI-2014 invited talk 140710

BEOL Corner Optimization

bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult

bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)

bull Compromised circuit performance at high Vdd

bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-

critical pathsbull We identify paths which can be safely signed off using tightened

BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan

7ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

Routed design

Timing analysis using conventional BEOL corners (CBC)

ECOusing CBC

violation = 0

done

Conventional Signoff

No

Routed design

Classify timing critical paths

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

This work

NoNo

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 6: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

6ISVLSI-2014 invited talk 140710

BEOL Corner Optimization

bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult

bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)

bull Compromised circuit performance at high Vdd

bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-

critical pathsbull We identify paths which can be safely signed off using tightened

BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan

7ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

Routed design

Timing analysis using conventional BEOL corners (CBC)

ECOusing CBC

violation = 0

done

Conventional Signoff

No

Routed design

Classify timing critical paths

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

This work

NoNo

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 7: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

7ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

Routed design

Timing analysis using conventional BEOL corners (CBC)

ECOusing CBC

violation = 0

done

Conventional Signoff

No

Routed design

Classify timing critical paths

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

This work

NoNo

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 8: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

8ISVLSI-2014 invited talk 140710

Conventional BEOL Corners

bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)

bull Homogeneous corners all variation sources are skewed in the same direction

bull BEOL RC variations are modeled in interconnect technology file (itf)

M2

M3

M1

S2 W2T2

H2 Inter-layer dielectric

Inter-metal dielectric

H3

H1

T1

T3

ΔW ΔT ΔH

Ytyp typical typical Typical

Ycb min min max

Ycw max max min

Yrcb max max max

Yrcw min min min

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 9: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

9ISVLSI-2014 invited talk 140710

Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH

bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27

bull BEOL layers in the same process module use the same manufacturing equipment and process steps

bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)

bull zu and zv are in the same process module

M2 z4 z5 z6

M4 z10 z11 z12

M3 z7 z8 z9

M5 z13 z14 z15

M6 z16 z17 z18

M7 z19 z20 z21

M8 z22 z23 z24

M9 z25 z26 z27

M1 z1 z2 z3

Process module 3

Process module 2

Process module 1

Examples bull ΔW in layer M4 has a

positive correlation with ΔW in layers M5 M6 and M7

bull But ΔW in layer M4 is not correlated with ΔT in M4

ΔW ΔT ΔH

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 10: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

10ISVLSI-2014 invited talk 140710

Pessimism of Conventional BEOL Corners (CBC)

bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj

dj(YCBC) ge 3σj + dj(Ytyp)

bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)

Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb

bull Small αj large pessimism of CBC

delay-3σ

dj(YCBC) - dj(Ytyp)3σj

Large pessimism

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 11: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

11ISVLSI-2014 invited talk 140710

Intuition on Delay Variability Across Cw RCw

α α

Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 here delay variations covered by RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 12: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

12ISVLSI-2014 invited talk 140710

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations

In the following α is defined at the dominant corner

Intuition on Delay Variability Across Cw RCw

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 13: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

13ISVLSI-2014 invited talk 140710

Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α

bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp) α

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 14: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

14ISVLSI-2014 invited talk 140710

bull Paths with small Δdrcw and Δdcw have large α

bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))

bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw

Find Paths for Which TBCs Can Be Used

Δd(Ycw)d(Ytyp)

Δd(Yrcw)d(Ytyp)

Acw

Arcw

Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )

α

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 15: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

15ISVLSI-2014 invited talk 140710

Determining α Arcw and Acw

Δd at C-worst corner ()Δd at RC-worst corner ()

bull Assumption critical paths in different designs have similar trends

bull Extract Arcw and Acw from a set of representative paths

bull Plot α vs Δdelay find Arcw and Acw for a given α

bull Add +1 margin on Arcw and Acw to account for sampling error

bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC

Δd at C-worst corner ()

Arcw Acw

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 16: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

16ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners

bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by

24 to 100

bull TBC-06 more benefits bull Tradeoff between reduced margin

vs paths which use TBC

Correlation factor γ = 05

LEON SUPERBLUE12 NETCARD

-018-016-014-012

-01-008-006-004-002

0

CBC TBC-05 TBC-06 TBC-07

WN

S (n

s)

LEON SUPERBLUE12 NETCARD

-90-80-70-60-50-40-30-20-10

0

CBC TBC-05 TBC-06 TBC-07

TNS

(ns)

LEON SUPERBLUE12 NETCARD0

200400600800

1000120014001600

CBC TBC-05 TBC-06 TBC-07

Tim

ing

viol

ation

s

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 17: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

17ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 18: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

18ISVLSI-2014 invited talk 140710

How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs

Razor Razor-Lite TIMBER

Razor Razor-Lite TIMBER

Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]

Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]

recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 19: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

19ISVLSI-2014 invited talk 140710

Tradeoff Resilience Cost vs Datapath Cost

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

D

fanin coneD Q

error

D Q

error

D Q

error

Razor FF

error

normal FFQ

QSET

CLR

D

endpoint Razor FF

optimize fanin cone w tighter constraint

normal FF

area (power) of fanin cone

area (power) w Razor overhead

Razor FFs (resilience cost)

Powerarea of fanin circuits

Tradeoff

8

9

10

11

12

0

1

2

3

4Total energy

Energy of non-resilient part

Resilience cost

Razor FFs

En

erg

y (

mJ

)

300 100 50 0

We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 20: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

20ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints

Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions

Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest

119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)

119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)

119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)

119875119908119903 (119888)

119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)

iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest

Candidate Sensitivity Functions

p negative slack endpointc cells within fanin coneNumcri number of negative slack cells

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 21: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

21ISVLSI-2014 invited talk 140710

Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-

exercised paths1 Generate sequential graph

2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex

3 Iterate Step 2 until all endpoints are optimized

FF1 FF2 FF3W12 W23

Clock

Data path Clock tree

W31

119882 119901119902=119878119897119886119888119896119901 119902

1+β times119879119866(119901 119902 )

Setup slack of path p-q

Weighting factor

Toggle rate of path p-q

Wrsquo

Wrsquo Wrsquo

Wrsquo = average weight on cycle

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 22: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

22ISVLSI-2014 invited talk 140710

Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity aware clock skew optimization

SkewOpt

OR-tree insertion

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 23: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

23ISVLSI-2014 invited talk 140710

Benefit of Low-Cost Resiliencebull Reference flows

bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints

bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods

bull Resilience benefits increase with larger process variation

PM BF CO PM BF CO PM BF CO27

29

31

33

35

37

En

erg

y (

mJ

)

PM BF CO PM BF CO PM BF CO22

26

30

34

38Energy penalty of throughput degradation

Energy penalty of additional circuits

Energy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 24: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

24ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

086 09 094 098 10225

30

35

40

45

50pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

070 072 074 076 078 08024

26

28

30

32

34

36 pure-marginbrute-forceCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

Technology foundry 28nm

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 25: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

25ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 26: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

26ISVLSI-2014 invited talk 140710

Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs

bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax

bull AVS can be used to compensate for performance degradation

Circuit

Closed-loop AVS

On-chip aging

monitor

Circuit performanc

e

Voltage regulato

r

Circuit frequency

Vdd

time

time

Without AVSWith AVS

target

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 27: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

27ISVLSI-2014 invited talk 140710

Derated Library Characterization and AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib selection should consider BTI + AVS interaction

bull Aging and Vfinal are unknowns before circuit implementation

BTI degradation and AVS

Vfinal

VBTI |Vt|

Step 1

Vlib

Derated library

Step 2

Circuit implementation and

signoff

circuit

Step 3

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 28: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

28ISVLSI-2014 invited talk 140710

Library Characterization for AVS

bull VBTI = Voltage for BTI aging estimation

bull Vlib = Voltage for circuit performance estimation (library characterization)

bull VBTI and Vlib are required in signoff

bull VBTI and Vlib depend on aging during AVS

bull Aging and Vfinal are unknowns before circuit implementation

Vlib

VBTI Derated library

|Vt| Circuit implementation and

signoff

circuitBTI degradation and AVS

Vfinal

Step 1 Step 2 Step 3

No obvious guideline to define VBTI and Vlib

Inconsistency among Vfinal Vlib VBTI

bull What is the design overhead when timing libraries are not properly characterized

bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 29: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

29ISVLSI-2014 invited talk 140710

Power vs Area Across Different Signoffs

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Large lifetime energy overhead

bull May fail to meet timing if desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 30: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

30ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 31: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

31ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)

bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library

characterization

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 32: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

32ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 33: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

33ISVLSI-2014 invited talk 140710

Proposed Library Characterization Flow

bull Heuristic obtain Vheur by averaging Vfinal of different cells

bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation

Obtain Vheur (average of standard cells)

Obtain derated library with VBTI = Vlib = Vheur

Signoff circuit with derated library

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 34: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

34ISVLSI-2014 invited talk 140710

Power vs Area for All Designs

bull 4 designs x DC AC x derating methods)

Proposed method

Circuit signed off usingother derated libraries

ldquoKneerdquo point for balanced area and power tradeoff

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate aging

bull Consume more powerbull May fail to meet timing if

desired supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 35: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

35ISVLSI-2014 invited talk 140710

bull Signoff mode = (voltage frequency) pair

bull Multi-mode operation requires multi-mode signoff

bull Example nominal mode and overdrive mode

bull Selection of signoff modes affects area power

bull ASP-DAC 2013 Optimization of signoff modes

Improve performance power or area

Reduce overdesign

NOM

ODNOM

OD

time

Vdd

tnom tOD tnom tOD

Also Multi-Mode Signoff Choices Matter

12

Fix fOD still 14 power range

Power of circuits w different overdrive modes

Different overdrive modes 26 power range

fnom = 800MHz Vnom = 08V

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 36: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

36ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 37: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

37ISVLSI-2014 invited talk 140710

Also Tunable Monitors Less Margin

Aggressive config Vmin_est lt Vmin_chip Some chips will fail

Default config

bull Low resistance passgates

bull Guardband for worst-case

bull Vmin_est gt Vmin_chip

bull 13mV margin

Optimized configbull Increase high

resistance passgatesbull Vmin_est asymp Vmin_chip

Benefits of tunability bull Compensate for difference

between model vs siliconbull Recover margin when variation is

reduced due to improved process

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 38: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

38ISVLSI-2014 invited talk 140710

Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 39: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

39ISVLSI-2014 invited talk 140710

Conclusionsbull Variability severely challenges IC value

bull In manufacturing process during operation across lifetime

bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge

bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff

bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that

extends the value trajectory of Moorersquos Law

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 40: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

40ISVLSI-2014 invited talk 140710

Thank You

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 41: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

41ISVLSI-2014 invited talk 140710

Backup

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 42: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

42ISVLSI-2014 invited talk 140710

Power Penalty to Fix EM with AVS

1 2 3 4 5 6 7 8 91200

1300

1400

1500

1600

1700

030

032

034

036

Core Power (mW) PG Power (mW)

Implemetation

Core

Pow

er (m

W)

PG

Pow

er (m

W)

bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff

Highest invested guardband

Least invested guardband

14 power penalty

>

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 43: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

43ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

C-3σ

Layer M2

C-3σ

Layer M1

Interconnect stack with M1 and M2

M1 C

M2 C

3σ Pessimism

Example worst-case capacitance corner Homogeneous

Cw corner

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 44: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

44ISVLSI-2014 invited talk 140710

Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a

homogeneous corner for an interconnect stack

Interconnect stack with M1 and M2

M1 C

M2 C

Homogeneous Cw corner

C-3σ

Layer M2

C-3σ

Layer M1

Pessimism

Example worst-case capacitance corner

When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 45: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

45ISVLSI-2014 invited talk 140710

Correlation Matrixbull Let Σ be the correlation matrix for variation sources

M1 M2 M3 M4

ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH

M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0

ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0

ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0

M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0

ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0

ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0

M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0

ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0

ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0

M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0

ΔT 0 0 0 0 0 0 0 0 0 0 1 0

ΔH 0 0 0 0 0 0 0 0 0 0 0 1

= Σ

Correlation for variation sources with the same variation type and in the process module γ 05

Variation sources in different process modules are independent

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 46: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

46ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths (2)

bull 92 of paths have lt 60 of wirelength on any single layer

Max wirelength ratio across all layers ()

Cum

ulati

ve p

roba

bilit

y

092

60

bull Variations in different layers are not fully correlated

bull Averaging uncorrelated variation smaller RC variation

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 47: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

47ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 48: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

48ISVLSI-2014 invited talk 140710

Delay Variation

α α

Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner

C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner

α lt 10 delay variations are covered by the RC-worst corner

Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst

Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst

Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)

bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 49: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

49ISVLSI-2014 invited talk 140710

Non-Homogeneous Corner

bull Each layer can have different skewed variationsInterconnect stack with M1 and M2

M1 C

M2 C

Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp

bull Less pessimism with non-homogeneous cornersbull Challenge

bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 50: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

50ISVLSI-2014 invited talk 140710

Opportunities for Tightened BEOL Corners

bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in

itf with α = 05

Δdj(Yrcw)dj(Ytyp) x 100

3σjd(Ytyp) x 100

Challenge how to avoid underestimating delay variation to preserve parametric yield

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 51: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

51ISVLSI-2014 invited talk 140710

Wiring Structure in Timing-Critical Paths

bull Critical paths are structurally similar

bull Wires on critical paths are routed on many layers

bull Structure is an outcome of the design flow

Testcasebull 45nm foundry library (wire

resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K

standard cell instancesbull 9 metal layersbull Extract critical paths from

different PVT and BEOL corners

Wirelength ratio ()

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 52: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

52ISVLSI-2014 invited talk 140710

Proposed Timing Signoff Flow

bull Extract RC at RC-worst C-worst and the typical corners

bull Calculate Δdelay of critical paths

bull Put path j in the group Gtbc if Δdelay is larger than a threshold

bull Fix only the paths in Gtbc using tightened BEOL corners

bull Since tightened corners have smaller delay variations timing closure is easier

Routed design

Timing analysis at BEOL corners Ytyp Ycw Yrcw

GTBC GCBC

ECOusing CBC

Timing analysis

using TBC

violation = 0

Timing analysis

using CBC

violation = 0

ECOusing TBC

done

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 53: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

53ISVLSI-2014 invited talk 140710

Experiment Setup

LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31

Gate count 232K 575K 1031KUtilization () 84 79 82

Core area (mm2) 045 104 191Max transition (ps) 330 330 330

Testcases for validation (45nm library with 8X wire resistivity)

αCorrelation factor = 05

Acw () Arcw ()

TBC-05 05 43 73

TBC-06 06 33 50

TBC-07 07 30 34

Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw

Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 54: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

54ISVLSI-2014 invited talk 140710

Further Analysis

bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α

bull A path has small Δdelays the path is equally sensitive to R and C

bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1

bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj

Nominal delay

Delay sensitivity to unit change in M1 resistance

Delay sensitivity to unit change in M1 capacitance

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 55: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

55ISVLSI-2014 invited talk 140710

Scaling Factor Results

LEON3MP

SUPERBLUE12NETCARD

α gt 05α gt 05

α gt 05

bull Similar trends in different designs

bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 56: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

56ISVLSI-2014 invited talk 140710

Benefits of Tightened BEOL Corners (1)

bull WNS and TNS are reduced by up to 120ps and 61ns

bull Timing violations reduces by 31 to 100

Correlation factor γ = 0 (variation sources are independent)

LEON SUPERBLUE NETCARD

-0180-0160-0140-0120-0100-0080-0060-0040-002000000020

CBC TBC-1 TBC-2

WN

S (n

s)

LEON SUPERBLUE NETCARD

-90-80-70-60-50-40-30-20-10

0CBC TBC-1 TBC-2

TNS

(ns)

LEON SUPERBLUE NETCARD0

200400600800

1000120014001600

CBC TBC-1 TBC-2

Tim

ing

viol

ation

s

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 57: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

57ISVLSI-2014 invited talk 140710

Heuristics 1

bull Model BTI degradation with Vfinal throughout lifetime

bull Aging of a flat Vfinal asymp aging of an adaptive Vdd

bull But slightly pessimistic

Vdd

time

NBTI

PBTI

VBTI = Vlib asymp Vfinal

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 58: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

58ISVLSI-2014 invited talk 140710

Vfinal Estimation

bull Problem Vfinal is not available at early design stage (design has not been implemented)

bull Vfinal = Vdd end of life (to compensate BTI aging)

bull Gates along critical pathbull Timing slack at t = 0

bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 59: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

59ISVLSI-2014 invited talk 140710

Observation and Heuristic 2

bull Observation 2 Vfinal is not sensitive to gate types

bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack

bull Assume timing slack = 0

10mV

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 60: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

60ISVLSI-2014 invited talk 140710

Technology and Benchmark Circuits

bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC

Supply voltages

Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105

Vmax105V

Vinit090V

Vheur1 (DC) 097V

Vheur1 (AC) 095V

Vheur2 (DC) 095V

Vheur2 (AC) 093V

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 61: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

61ISVLSI-2014 invited talk 140710

A Reference Signoff Flow

bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime

bull Signoff flowbull Estimate aging at each time step

bull Update circuit timing and Vdd

bull Repeat until t = tfinal

bull Modify circuit and start over if Vfinal gt maximum allowed voltage

bull No overhead in timing analysis but very slow Many STA runs

and library

Vstep AVS voltage stepVfinal converged voltage

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 62: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

62ISVLSI-2014 invited talk 140710

Experiment Setupbull Characterize different derated libraries

bull Evaluate impact of library characterizationbull Seven setups

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2

VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 63: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

63ISVLSI-2014 invited talk 140710

ldquoChicken and Eggrdquo Loop

bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation

bull Timing constraints critical paths etc

bull Circuit is affected by library characterization

Circuit

Derated Libraries

Vfinal

Vlib VBTI

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 64: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

64ISVLSI-2014 invited talk 140710

Bias Temperature Instability (BTI)

|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS

|Vgs|

time

ON OFF ON OFF

[VattikondaWC06]

Device aging (|ΔVth|) accumulates over time

[TCASrsquo14]

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 65: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

65ISVLSI-2014 invited talk 140710

Observation 1

[Chan11]

bull BTI is a ldquofront-loadedrdquo phenomenon

bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)

bull Most Vdd increment happens in early lifetime

bull Gap between Vdd and Vfinal reduces rapidly

asymp70 Vdd increment in 1 year(remaining 30 over 9 years)

Vfinal

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 66: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

66ISVLSI-2014 invited talk 140710

Results for DC Scenario

Optimistic signoff corner bull AVS increases supply voltage

aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired

supply voltage gt Vmax

Pessimistic signoff corner bull Ovestimate aging andor

underestimate circuit performance

bull Large area overhead

Good corners

1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner

for AVS4 Vbti = Vfinal Do not overestimate

aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 67: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

67ISVLSI-2014 invited talk 140710

Problem Signoff Corner Definition

bull Timing signoff ensure circuit meets performance target under PVT variations amp aging

bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis

bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis

Vlib for circuit performance estimation

Min Vdd Max Vdd

VBTI for aging

estimation

MinVdd

Not applicable (Optimistic)

Max Vdd

Slowest circuitLess aging

Faster circuitWorst-case aging

Slowest circuit Worst-case aging

Too pessimistic

With BTI aging and AVS the worst-case voltage corner is not obvious

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 68: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

68ISVLSI-2014 invited talk 140710

AVS Signoff Corner Selection

10000 12000 14000 16000 18000 20000 2200020

22

24

26

28

30

32

44

4

888

7776

66

555

3

33

2

22

11

1

Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)

Area (μm2)

Pow

er (m

W)

AES

Optimistic about AVS

Pessimistic about AVS

>

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 69: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

69ISVLSI-2014 invited talk 140710

AVS Impact on EM Lifetime

1 2 3 4 5 6 7 8 90

2

4

6

8

10

12

08

09

1

11

12Lifetime (year)

Implementation

Life

time

(yea

r)

Vfina

l (V)

Vfinal (V)

119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )

2

bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as

30 MTTF penalty

200mV voltage compensation

>

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 70: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

70ISVLSI-2014 invited talk 140710

0 2 4 6 8 10 12090092094096098100102104

S1 S2 S3 S4 S5

Year

VDD

DMA 3S1 S2 S3 S4 S5

78

79

80

81

MTT

F (Y

ear)

EM Impact on AVS Scheduling

12 years MTTF penalty

>

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 71: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

71ISVLSI-2014 invited talk 140710

What is ldquoSignoffrdquo

bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip

Nominal VddStatic IR drop

Power grid IR gradientDynamic IR

HCINBTI

Signoff Vdd

Voltage

Problem Margins = pessimism

overdesign schedule delay

ldquomargin stackrdquo for voltage signoff

Operating voltage

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 72: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

72ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (1)

bull Delay sensitivity of path pj to variation source zv

bull Assumptions bull Δdjv is linear with respect to variation sources

bull Variation sources are normal distributions

bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)

28 itf files (27 variation

sources + Ytyp)

Routed Netlist

RC extraction

STA

Δdjv

Δdjv = [ - ] 3dj(Yv) dj(Ytyp)

Note Path delay includes gate and wire delays

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 73: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

73ISVLSI-2014 invited talk 140710

Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)

bull Σ = λλT (Note λ is obtained by Cholesky decomposition)

Delay sensitivities with correlation

[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ

Standard deviation of path delay

σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05

Note we use the delay variation from the statistical analysis as a reference

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 74: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

74ISVLSI-2014 invited talk 140710

Resilient Designs

bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)

bull Trade off design robustness vs design quality Eg enable margin reduction

bull Improve performance (ie timing speculation)

084 088 092 096 10030

34

38

42

46

50

54

58

62conventional design

reilient Design

Supply voltage (V)

En

erg

y (

mJ

)

Conventional design Worst-case signoff No Vdd downscaling

Resilient design Typical-case signoff Vdd downscaling reduced energy

15 reduction

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 75: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

75ISVLSI-2014 invited talk 140710

Resilience Cost Reduction Problem

bull Given RTL design throughput requirement and error-tolerant registers

bull Objective implement design to minimize energy bull Estimation of design energy

119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905

h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879

+1minus119864119877119903times119879

recovery cycles

Clock period

Error rate [Kahng10]

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 76: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

76ISVLSI-2014 invited talk 140710

Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization

bull Question Which endpoint to be optimized

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 77: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

77ISVLSI-2014 invited talk 140710

Process-Aware Vdd Scaling (PVS)

Open-Loop AVS

Closed-Loop AVSP

ow

er

Freq amp Vdd LUT

Post-silicon characterization

AVS Pre-characterize LUT [Martin02]

Process-aware AVSPost-silicon characterization [Tschanz03]

Generic monitor

Design dependent replica

In-situmonitor

Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]

In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]

Error Detection System

Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]

Error Tolerance

AVS

approachesAVS classes

77

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 78: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

78ISVLSI-2014 invited talk 140710

Challenge Variability

1998 2000 2003 2006 2008 20111

10

MPU Release Date

Tran

sisto

r Cou

nt [M

]

Source [CPUDB]

DENSITY

IdealNon-ideality

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

0

100

200

300

400

500

600

700

0

02

04

06

08

1

12Dynamic Power (W)

POWER

Source [JeongK08]

IdealNon-ideality

2006 2008 2010 2012 2014 20161000

10000

100000

Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling

DRIVE CURRENT

Ideal

Source [ITRS]

Non-ideality

1995 2000 2005 2011 20160

05

1

15

2

25

3

MPU Release Date

Volt

SUPPLY VOLTAGE

Source [CPUDB]

Ideal

Non-ideality

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 79: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

79ISVLSI-2014 invited talk 140710

Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient

designs thus reduced powerbull Proposed method trades off between timing-error penalty vs

reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction

compared to pure-margin designs Resilience benefits increase in the context of AVS strategy

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 80: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

80ISVLSI-2014 invited talk 140710

Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for

circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly

threshold voltages)bull One mode is outside of the design cone of the other

failed design overdesignbull Mode A has positive timing slacks with respect to mode B

mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other

bull Modes are in each othersrsquo design cone

Voltage

Frequency

A

Negative Slacks = failed design

Positive Slacks = overdesign

B

C

Design Cone of mode A

Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign

Guideline search for signoff modes within design cone reduce overdesign

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 81: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

81ISVLSI-2014 invited talk 140710

Our Method Global Optimizationbull Iteratively sample and refine power models

bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable

Sample (SPampR)

Construct power models

Estimate optimal signoff modes

Sample (SPampR)

Refine power models

Adaptive search

Global optimization flow

09 10 11 1214

15

16

17

18

19

201st 2nd real

Signoff Voltage (v)

Po

wer

(m

W)

Power estimation of adaptive search

bull Ovals indicate sample pointsbull 1st 2nd power from power models at first

second iterationbull real power from real implemented circuits

Design AESf 700MHz

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 82: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

82ISVLSI-2014 invited talk 140710

Classes of Closed-Loop AVS

bull Critical path may be difficult to identify (IP from 3rd party)

bull Calibrating monitors at multiple modesvoltages requires long test time

Closed-Loop AVS

Design-dependent replica

In-situmonitor

Generic monitor

bull Does not capture design-specific performance variation

82

This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 83: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

83ISVLSI-2014 invited talk 140710

Design of RO with Tunable Vmin

bull Identified two circuit knobs to tune Vmin

bull Series resistancebull Cell types (INV NAND NOR)

bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low

1 bit 1 bit 1 bit Control pins

High resistance

Low resistance

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 84: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

84ISVLSI-2014 invited talk 140710

Benefit of Resilience Cost Reductionbull Reference flows

bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints

bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods

bull Resilience benefits increase with safety margin

PM BF CO PM BF CO PM BF CO25

30

35

40

45

50

55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience

En

erg

y (

mJ

)

Large marginMedium marginSmall margin

MUL

PM BF CO PM BF CO PM BF CO25

27

29

31

33

35

En

erg

y (

mJ

)

EXU

Large marginMedium marginSmall margin

Smallmediumlarge margin safety margin = 51015 of clock period

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 85: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

85ISVLSI-2014 invited talk 140710

Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for

resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a

lower supply voltagebull Average 18 energy reduction compared to pure-margin designs

Resilience benefits increase in AVS context

084 088 092 096 10030

36

42

48

54

60brute-forcepure-marginCombOpt

Supply voltage (V)

En

erg

y (

mJ

)

084 086 088 09 092 094 096 098 1 10225

29

33

37

41

45brute-force

pure-margin

CombOpt

Supply voltage (V)

En

erg

y (

mJ

)

MUL EXU

Minimum achievable energy

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)
Page 86: 1 ISVLSI-2014 invited talk, 140710 Toward Holistic Modeling, Margining and Tolerance of IC Variability Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu

86ISVLSI-2014 invited talk 140710

Overall Optimization Flow

bull Iteratively optimize with SEOpt and SkewOpt

Initial placement (all FFs = error-tolerant FFs)

Energy lt min energy

Save current solution

Margin insertion on K paths based on sensitivity function

Replace error-tolerant FFs w normal FFs

SEOpt

Activity-aware clock skew optimization

SkewOpt

  • Toward Holistic Modeling Margining and Tolerance of IC Variabi
  • IC Variability
  • Challenge Value of Technology
  • Solutions Modeling Margining Tolerance
  • Outline
  • BEOL Corner Optimization
  • Proposed Timing Signoff Flow
  • Conventional BEOL Corners
  • Statistical RC Model
  • Pessimism of Conventional BEOL Corners (CBC)
  • Intuition on Delay Variability Across Cw RCw
  • Intuition on Delay Variability Across Cw RCw (2)
  • Scaling Factor α and Delay Variation
  • Find Paths for Which TBCs Can Be Used
  • Determining α Arcw and Acw
  • Benefits of Tightened BEOL Corners
  • Outline (2)
  • How to Minimize Cost of Resilience
  • Tradeoff Resilience Cost vs Datapath Cost
  • Selective-Endpoint Optimization (SEOpt)
  • Clock Skew Optimization (SkewOpt)
  • Overall Optimization Flow
  • Benefit of Low-Cost Resilience
  • Increased Benefit of Resilience with AVS
  • Outline (3)
  • Breaking Chicken-Egg Loops Less Margin
  • Derated Library Characterization and AVS
  • Library Characterization for AVS
  • Power vs Area Across Different Signoffs
  • Heuristics 1
  • Vfinal Estimation
  • Observation and Heuristic 2
  • Proposed Library Characterization Flow
  • Power vs Area for All Designs
  • Also Multi-Mode Signoff Choices Matter
  • Also Tunable Monitors Less Margin
  • Also Tunable Monitors Less Margin (2)
  • Outline (4)
  • Conclusions
  • Thank You
  • Backup
  • Power Penalty to Fix EM with AVS
  • Homogeneous Corners
  • Homogeneous Corners (2)
  • Correlation Matrix
  • Wiring Structure in Timing-Critical Paths (2)
  • Delay Variation
  • Delay Variation (2)
  • Non-Homogeneous Corner
  • Opportunities for Tightened BEOL Corners
  • Wiring Structure in Timing-Critical Paths (2)
  • Proposed Timing Signoff Flow (2)
  • Experiment Setup
  • Further Analysis
  • Scaling Factor Results
  • Benefits of Tightened BEOL Corners (1)
  • Heuristics 1 (2)
  • Vfinal Estimation (2)
  • Observation and Heuristic 2 (2)
  • Technology and Benchmark Circuits
  • A Reference Signoff Flow
  • Experiment Setup (2)
  • ldquoChicken and Eggrdquo Loop
  • Bias Temperature Instability (BTI)
  • Observation 1
  • Results for DC Scenario
  • Problem Signoff Corner Definition
  • AVS Signoff Corner Selection
  • AVS Impact on EM Lifetime
  • EM Impact on AVS Scheduling
  • What is ldquoSignoffrdquo
  • Statistical Timing Analysis (1)
  • Statistical Timing Analysis (2)
  • Resilient Designs
  • Resilience Cost Reduction Problem
  • Selective-Endpoint Optimization
  • Process-Aware Vdd Scaling (PVS)
  • Challenge Variability
  • Energy Reduction in AVS Context
  • Our Concept Mode Dominance
  • Our Method Global Optimization
  • Classes of Closed-Loop AVS
  • Design of RO with Tunable Vmin
  • Benefit of Resilience Cost Reduction
  • Increased Benefit of Resilience With AVS
  • Overall Optimization Flow (2)