Joint Design-Time and Post-Silicon Minimization ... · sizing, adaptive body biasing, and adaptive supply voltage biasing. Because joint timing-leakage optimization is ofprimary concern,

Joint Design-Time and Post-Silicon Minimization of ParametricYield Loss using Adjustable Robust Optimization

Murari Mani, Ashish K. Singh, and Michael OrshanskyDepartment of Electrical and Computer Engineering, University of Texas at Austin

[email protected]

ABSTRACTParametric yield loss due to variability can be effectively reducedby both design-time optimization strategies and by adjustingcircuit parameters to the realizations of variable parameters. Thetwo levels of tuning operate within a single variability budget,and because their effectiveness depends on the magnitude and thespatial structure of variability their joint co-optimization isrequired. In this paper we develop a formal optimizationalgorithm for such co-optimization and link it to the control andmeasurement overhead via the formal notions of measurementand control complexity.

We describe an optimization strategy that unifies design-timegate-level sizing and post-silicon adaptation using adaptive bodybias at the chip level. The statistical formulation utilizesadjustable robust linear programming to derive the optimal policyfor assigning body bias once the uncertain variables, such as gatelength and threshold voltage, are known. Computationaltractability is achieved by restricting optimal body bias selectionpolicy to be an affine function of uncertain variables. Wedemonstrate good run-time and show that 5-35% savings inleakage power across the benchmark circuits are possible.Dependence of results on measurement and control complexity isstudied and points of diminishing returns for both metrics areidentified.

Categories and Subject DescriptorsB.7.2 [Integrated Circuits]: Logic Design- optimization

General TermsAlgorithms, Design, Reliability

1. INTRODUCTIONIncreased variability of device parameters necessitates thedevelopment of a new generation of circuit synthesis CAD tools.In addition, the increases in variability and power consumptionare closely related because of the exponential dependence ofleakage on some process and environment parameters. Twofundamental paradigms are available for dealing with variability:statistical design (optimization at design time) and post-siliconadaptivity (on-line tuning). To guarantee reliable circuit operationwith minimal power consumption, next-generation circuitsynthesis techniques for robustness must explicitly account for the

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. To copyotherwise, or republish, to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.ICCAD'06, November 5-9, 2006, San Jose, CACopyright 2006 ACM 1-59593-389-1/06/0011...$5.00

availability of post-silicon adaptivity in synthesizing the circuit.

Two powerful and complementary strategies for reducing leakageconsidering variability are pre-silicon statistical designoptimization and post-silicon adaptivity. There is a growing bodyof work on statistical circuit analysis methods [1-3] (i.e., SSTA)and statistical post-synthesis optimization [4-6], including sizingand dual-threshold voltage assignment algorithms. These toolsshow promise in reducing parametric yield loss, or alternatively,reducing power consumption while maintaining high yield: insome cases, a 25% reduction in power is gained at the cost of 5%timing yield loss. The growing magnitude and complexity ofuncertainty is bound to make post-synthesis optimizationtechniques insufficient in guaranteeing reliable circuit operationwith reasonable parametric yield.

Post-silicon design adaptivity, or tuning, currently includesseveral techniques; the primary ones being adaptive body biasing(ABB) and adaptive supply voltage (ASV). ABB uses the bodyeffect to modulate the threshold voltages of transistors, therebycontrolling leakage and performance [7-10]. ASV raises thepower supply (Vdd) for slow (low-leakage) dies, and lowers it forfast (high-leakage) dies, ensuring better overall yield [11]. It relieson the roughly cubic dependence of leakage power on Vdd inCMOS circuits (also impacting dynamic power quadratically). Inthe future, a larger palette of tuning tools is likely to emerge:recently, an adaptive-size tapered Pareto buffer was designed withcontrol facilitated via a tri-state buffer [12].

A widespread industrial adoption of adaptive techniques is not yetpossible for two reasons. One is that designers do not have thetools to help them decide whether, and how much, adaptivecircuitry is needed, or what type of post-silicon tuning techniquewill be most appropriate. The availability of both design-time(pre-silicon) optimization and post-silicon adaptivity leads to arich optimization space in which coordination between the twolevels is required. Sizing can be used to upsize the gates beyondthe need of a nominal design to achieve higher timing yield, butwith increased power. Alternatively, the adaptivity of thresholdvoltage can be used to tighten the speed distribution to improveyield. Depending on the magnitude and the spatial structure ofvariability, the two approaches will have different cost-effectiveness, i.e., they will be characterized by different Paretocurves in the space of design objectives.Algorithmically, future robust circuit synthesis can beconceptualized as a two-stage optimization problem, withadditional second-stage tuning available upon the realization ofuncertain variables. In this paper an efficient formulation isproposed using the theory of adjustable optimization. Thisoptimization paradigm presumes that the decision-maker has achance to update his optimization strategy upon learningadditional information. If the objective function is linear in thedecision variables, then, under the conditions that the uncertainty

19

sets are affine functions of some parameters, the optimal policyfor the second-stage decisions can be computed efficiently. Twostage stochastic optimization problems are difficult to solve [13].However, recent developments in the theory of convexprogramming have enabled the solution of robust versions oflinear and quadratic programs, which can be expressed as second-order conic programs or semi-definite programs [14, 15].Extending these solution methods to adjustable robustprogramming has been demonstrated in [16]. In this paper, webuild upon this work to develop an efficient solution to the postsilicon optimization problem under variability.The problem is formulated in the following way. The first-stage(design-time) power-delay optimization is done via sizing, andsecond-stage (post-silicon) optimization is achieved by body biastuning. The second stage decision variables are represented asaffine function of parameter uncertainty. The solution to thisoptimization problem is a design time decision (size of gates inthe circuit) and an optimal policy that prescribes the amount ofbias depending on the realizations of uncertain variables (e.g. gatelength, Vth on a specific chip). Initial experiments prove that theoptimal synergy between design-time and post-siliconoptimization depends on the amount and structure of variability.If variability is highly spatially correlated within the chip, design-time optimization will be ineffective and may even lead to largeyield losses. On the other hand, with the increase of intra-chipvariability, the effectiveness of post-silicon adaptivity decreases.Three measures of complexity that parameterize the solution andthe optimality of this problem are introduced by us: the controlcomplexity (the granularity of control), the measurementcomplexity (the granularity of the monitoring and sensingcircuitry), and the parameter complexity (a measure of howspatially uncorrelated the process variable is). Using thesemetrics, formal quantitative trade-offs between design-time andpost-silicon adaptivity can be identified. Such capability will alsobe useful for the analysis and development of the fine-granularcontrol structures, e.g. for determining the spatial granularity.

The rest of the paper is organized as follows. Section 2 motivatesthe need for joint co-optimization between design-time and post-silicon optimization. Section 3 presents the leakage and delaymodels used. The details of the algorithm are presented in Section4 followed by the results and analyses in Section 5.

2. DESIGN-TIME / POST-SILICON CO-OPTIMIZATION: MOTIVATION ANDCHALLENGESThe central problem of statistical optimization methods isreducing the dual parametric yield loss due to power and timingconstraints. This is because power consumption has become theyield-limiting factor, indirectly affecting the achievable maximumclock frequency [17]. In the absence of substantial leakagepower, parametric yield is determined by the maximum possibleclock frequency. When realistic leakage power numbers forcurrent CMOS technologies are added, the total power startsapproaching the power limit determined by the cooling andpackaging considerations. Crucially, the exponential dependenceof leakage on process spread will mean that the total power willcross the cooling (power) limit well below the maximum possiblechip frequency, since chips operating at higher frequencies haveexponentially higher leakage power consumption. Due to theinverse correlation between speed and leakage, yield is limited

both by slower chips and chips that are too fast, because they aretoo leaky.

The fundamental limitation of design-time methods is that theyimpose an overhead on each instance of the fabricated chip sincethey intrinsically lack the ability to "react" to the actualconditions on the chip. For example, when using sizing for timingoptimization they impose a fixed area overhead that may bewasteful on some instances of the ICs that would meet timingeven with smaller driver sizes. Having an adjustable-width driverwould be ideal, since it could ensure meeting constraints with theminimum overhead for each chip.

The problem that we address in this paper is how to performdesign-time circuit optimization and post-silicon tuning jointly.Why should these two steps be coordinated, i.e., why do we needjoint co-optimization? The two methods operate from differentviewpoints: in design-time optimization a decision (e.g., sizing)must be made before the realization of uncertainty (gate length),while in post-silicon tuning of the decision (the value of bias toapply) is made after the realization of uncertainty, i.e., when thechip's physical properties have been determined duringmanufacturing.

However, the two paradigms operate within a single budget ofuncertainty, and thus meeting constraints can be achieved by bothmethods. But their cost-effectiveness depends on specificconditions, such as the spatial correlation of process variability,the granularity of adaptivity that can be implemented, and themagnitude of leakage power in comparison with the switchingpower. The objective of this paper is to develop formal means andoptimization methods that will allow joint optimization. Thespecific optimization strategy will jointly consider the amount ofvariability and cost-effectiveness of power reduction strategies, toderive a policy that will guide post-silicon tuning, as well as makethe first-phase design decisions. This will allow to optimallypartition the design space between these levels of hierarchy.Formally, the objective of the algorithm we develop is tominimize the expected value of leakage power under a givendelay constraint T at a given yield amin El,ak s.t.P(D < T) > a

This formulation is generic and, different specific optimizationmechanisms can be studied. In this paper we focus on sizing andadaptive body bias for threshold control at the chip level, withonly a small number of partitions of the chip into individuallytunable clusters. The widely different spatial scales involved inthis problem are of some interest and are actively explored. In theabove formulation, the objective function and the constraintsdepend on both the design time optimization variables (sizes) andthe post silicon decision variables (body biases). The problem canbe formally viewed as a two-phase optimization under uncertaintywith recourse. The key contribution of our approach is thederivation of the optimal policy for body biasing as an affinefunction of the realizations of the uncertain parameters (gatelength, L , and threshold voltage, Vth ). The solution to the aboveoptimization problem therefore yields the sizes for the gates in thecircuit and an optimal body bias policy.

3. GATE AND CIRCUIT MODELINGAdjusting the circuit properties to manufacturing conditions canbe achieved by several techniques, including adaptive buffer

20

sizing, adaptive body biasing, and adaptive supply voltagebiasing. Because joint timing-leakage optimization is of primaryconcern, adaptive body bias may be the most useful tool. It hasbeen demonstrated [7, 11] that body biasing can be employed asan extremely effective knob to perform post silicon optimizationand performance tuning by reducing the leakage for those diesthat violate power constraints and increasing the frequency ofthose dies that do not meet delay specs.

The adaptive body bias technique exploits the dependency of thethreshold voltage of a MOSFET device on its source-to-bodyvoltage to achieve dynamic tuning of its delay and leakage power.For an NMOS device, the threshold voltage can be expressed as[18]:

Vth = VthO +±Y(VSB ±+2Xf bf)

where VthO is the threshold voltage of the device with zero bodybias, -y is the body bias coefficient, and Of is the Fermipotential. Decreasing the source potential relative to the body ofan N-channel device, translates to a negative VSB, and decreasesthe threshold voltage. This technique, known as forward bodybiasing (FBB) reduces the delay of the gate at the expense ofleakage power. On the other hand, application of reverse bodybias (RBB) by applying a positive VSB causes the thresholdvoltage of the device to increase. RBB is thus very effective inreducing the leakage power consumption [1].

The need to setup a rigorous statistical optimization problemunder uncertainty requires us to use approximate, linearized delaymodels, such as a piecewise delay of [19]. Let the gate delay berepresented as d, = d, + Ad1, where d, is the nominal gatedelay and Adi is the term representing the variability in delay.The dependence of nominal gate delay on gate sizes can bedescribed by the piecewise linear equations:

di =a',- ai2wU + a3EwkVlI C m] (1)

where m is the number of fitting regions I and ais are the fittingcoefficients. This model captures the dependence of delay on thesize of the gate width wi, and its loadE wk The accuracy of the

approximation is reasonable: the average error is less than 5% form = 3. The size range considered is 1-8x of the minimum sizegate.

Analytical models are used to relate the impact of variabilitysources on power and delay. The variability is assumed to comefrom two major sources. Transistor gate length (L ) exhibitsstrong lithography induced variability. Threshold voltage (Vth )variation due to oxide thickness and dose variation is also takeninto account. The impact of L on Vth due to drain-inducedbarrier lowering is predicted by the device model directly, whichpermits modeling L and Vth as independent random variables.Both L andVth are assumed to follow the normal distribution.An additive statistical model that decomposes the variability, ofboth L andVth, into the global (chip-to-chip) and local (intra-chip) uncorrelated variability components is used. For gate length:L = Lo + ALg +ALi (2)

The impact of process parameter variability on gate delay is

captured using a first-order parametric delay model:

Ad _ SIAL + S2A Vth + S3A VSB (3)where AL and A Vth are the parameter deviations and A VSB isthe applied body bias. The sensitivities are the first-orderderivatives of delay with respect to the specific variable(L, Vth, VSB )Using a modeling approach similar to [20], the subthresholdleakage current of a gate is expressed as an exponential functionof the random parameters as:

I = 10 exp(aAL + bA Vth + cA VSB) (4)where 10 is the nominal value of leakage per unit width. Weobtain a good fit using this model (Figure 1), the rms error being88%. For a circuit block the expression for leakage can be

expressed as:

Itot = Ep w1i .exp(aiALi + biA Vth,i + CiAVSB)

where the nominal gate leakage is 1o,i = pi wi.

Following [21], we assume that the impact of random componentof variation on chip-level leakage value can be captured by aconstant multiplier that we take to modify the value of ,i, in theabove expression.

The essence of adjustable optimization framework is that thevariable that is allowed to be tuned is not determined arbitrarilybut is dependent in some way on the realizations of uncertainvariables. As was mentioned before and will be justified in thenext section, a computationally tractable solution to a statisticaladjustable problem requires that A VSB be an affine function ofuncertain parameters, L and Vth

A VSB = 7TO + 7FIALg + 7W2A Vth,g (5)

Here, the coefficients 7w0, 7rI and 7w2 are to be determined in theprocess of optimization. Such a parameterization is physicallyequivalent to compensating for the variation in leakage due to Land Vth, by applying body bias [22]. Though, the value of bodybias is not a random variable, based on (5), it can be treatedmathematically as one. With that observation, let us define:

6-

5-

.0 4-0EELO 3 -

02(U-J

1 -

do

JR

0 1 2 3 4 5 6 7Leakage from SPICE

Figure 1. Comparison of the normalized leakage ofinverter predicted by SPICE and the analytical leakagemodel.

21

u

Xi = N(ti, oi2) = aiALg + biA Vth,g + CiA VSB

The mean and variance of a lognormal Y ex in terms of themean and variance of the normal random variableX = N(,u 2 ) are [23]:

E(Y) = exp(u + 2) (6)2

Var(Y) = exp[2(u + u2)] - exp(2,u + u2) (7)

Observing that:E(ItOt) ZE(i .wiw exp(aiALg + biA Vth,g + CiA VSB))

and , E( a-ALg + b,A Vth,g + C-A VSB) = wTOwe can write the expected value of total block leakage as:

E(Itot) = iwi *exp(wo + /2) (8)i~~~~~~ 8

4. DESIGN TIME / POST SILICON CO-OPTIMIZATION ADAPTABLE ROBUSTOPTIMIZATIONIn the optimization strategy we develop, the optimal body bias isdetermined after the realization of uncertainty of the processparameters. On-chip measurements are used to measure the actualparameter values and their deviations from nominal values. Then,the policy derived during optimization can be used to choose anoptimal forward or reverse body bias. RBB can be applied toreduce yield loss in the high frequency (high leakage bins), andcan be used with FBB to tighten the distribution at the lowfrequency bins.

4.1 Adaptable Robust OptimizationFirst we introduce the theoretical foundation for robust adjustableoptimization. We use robust optimization as the bedrock of ourstrategy. A robust LP can be defined as the problem ofminimizing the worst-case realization of a linear objective over aset of linear constraints with uncertain coefficients [24]:

min{ sup (cTix) Acx < b V( _ [A,b,c] C Z} (9)

Here the uncertainty in the matrix coefficients is represented as( _ [A, b, c] varying in the nonempty compact convex uncertaintyset Z .

The above problem requires all decisions to be made prior to theactual realization of the uncertain parameters. However, in manyreal-life cases not all the decisions can be made simultaneously:only some variables may become known earlier. In this case, theremaining decision variables can be adjusted to the realizations ofuncertain data. It is obvious that if the opportunity to adjustingsome variables is given, the optimal solution will be better (or atleast, no worse) than for the problem above. Problems withsimilar structure have been known as multi-stage stochasticproblems with recourse. However, robust problems are notstochastic problems, and when certain conditions are imposed onthe uncertainty set, demonstrate superior computationalproperties.

We can re-write the problem of (9) in terms of the non-adjustablevariables u and the adjustable variables v. This leads to theadjustable robust problem:

min{c (v :V((=-[U,V,b,c] C Z) v: Uu + Vv < b} (10)

In this formulation, the adjustable variables v are allowed todepend on the realization of ( .

Still, it is shown in [16] that the general robust problem withadjustable parameters is NP-complete, unless restrictions areapplied on how exactly the adjustable variables tune themselvesto uncertain data. It is shown that a computationally feasibleadjustable robust linear problem can be achieved if the adjustablevariables are constrained to be affine functions of the uncertainvariables. This is equivalent to:

v = w + W(From this we see that the adjustable variables can be tuned oncethe realization of uncertain data is known. However, if we are tobe able to identify an optimal policy and do that computationallyefficiently, the dependency cannot have general form, but must beconstrained. This ultimately leads to the affinely adjustable robustlinear program

min{c(T ) :Uu + V(w + W()v < b V_ [U,V,b,c] E Z} (11)

In particular, for uncertainty sets specified using linear or second-order cone constraints, the above problem can be reformulated asan LP or a second-order conic program respectively [16].

4.2 Co-Optimization: Problem FormulationWe now map our design-time and post-silicon tuning probleminto a robust adjustable linear program. Our objective informulating the problem is to set up a robust linear program withadjustable parameters. Robust programs have been recently usedfor several CAD problems [5], and are very efficient.

The task of co-optimization is effectively finding the solution to atwo-stage optimization problem with recourse. Denoting columnvectors by boldface letters, we formulate the problem as that ofminimizing the overall expected leakage power (or current) withexpectation being taken over the population of manufacturedchips while satisfying timing constraints under a statistical timingmodel:

(12)

In this formulation the objective and constraint functions aredependent both on design-time variables (gate sizes) and post-silicon optimization variables (A VSB ).

We begin by writing the expression for mean leakage as:

E(It,t) = gTw

where g is an N x 1 vector with entries

gi = ,i exp(-r0 + o-2 /2) The objective function is thus linear in

the gate sizes and non-linear (exponential) in A VSB We will dealwith this by adopting a linearization approach in which we locallylinearize the objective's dependence on A VSB at the fixed valueof vector of gate sizes w

Let us, for convenience form a single vector of decisionvariablesx= [w AVSB]T The gate delay model introduced in

the previous section can allow us to express path timingconstraints in the form of:

22

min E(It,,t) s.t. P(D(w, A VSB) < T) > a

D(w,A VSB) = aTx .

Now consider the probabilistic chance constraintP(D(w, A VSB) < T) > ag specified for the entire circuit. Wecan heuristically re-write the circuit-level probabilistic timingconstraints in terms of path-based constraints. We assume that acorresponding confidence level rj can be selected using thestrategy outlined in [25]. Then, we require:

P(Di(wA VsB) < T) > r/j Vj c II

where II is the relevant path-set. Relying on the linear vectorrepresentation introduced above we can write:

P(Di(w,AVsB) < T) = P(aTX < T) > r/jVj C H

If a is distributed normally, N(a, E), the coefficients of x

belong to an ellipsoidal uncertainty set [24]. Then, it can beshown that the above constraint is equivalent [14] to:

aTx + kj(xTX)l 2 < T Vj C fl (13)

where k, = z-1(rj) and X is the cumulative distributionfunction (cdj) of the standard normal distribution. The path delayconstraints of Eq. 13 represent a set of second-order conic pathtiming constraints [14]. Second-order conic programs are convex,and there exist extremely efficient techniques to solve SOCPs thatexploit their special structure [26]. While the worst-casecomplexity of interior-point methods for SOCP is polynomial, formost practical instances the run-time behavior is much better. Inour case, the empirically observed complexity is close to O(N)

It has been shown that adjustable robust linear programs can bemade computationally tractable only if the adjustable (second-stage) decision variables are affiOne functions of uncertainvariables [16]. Without loss of generality, consider only the globalsources of variation ALg and A Vth,g, and a single value of bodybias A VSB for all the gates on the chip. Then, the affine policy isgiven by:

A VSB = 7rO + 7FiALg + WF2A Vth,g (14)

This dependence can be used to express the expected value ofleakage current as:

gi = 13i exp (fo2i (fro) + fi2 ()Lg + f2i(2) ) (15)

where ho,i fii, and f2,i are linear functions of -ro, -r1, and if2respectively.

The final robust adjustable optimization problem ABB(w,7r)can now be expressed as:min g w

aj x-1q 1(aj)(xT x)1/ T Vj C 11

where

gi = 3exp(fO2i(wO) + fli (wI)oL + f2i(72)o+T22 ) Vi C [ N]

Note that the original problem has now been cast as anoptimization problem in -fo, -/1, and if2 and gate widths, w- .The solution to this problem is an optimal policyP = (T0, 7, 2) and the vector of gate width w such that the

timing constraints are satisfied.

4.3 Problem SolutionTo enable a computationally efficient solution, we solve theproblem in (16) as a two phase optimization program. The firstphase consists of solving a weighted sizing problem assumingfixed body bias and the second phase consists of solving for thebody bias value assuming fixed gate size. This is performed in aniterative manner using successive approximations until thesolution converges. We transform the path based formulation intoa node based formulation [25] to solve the problem efficiently.

This problem is solved iteratively by computing optimal ws in thefirst stage and optimal -is in the second stage until the solutionconverges. At an iteration I the w -phase consists of solvingABB(w, 7r(-1)) to obtain w(l) and the if -phase solves the

problem ABB(w(), 7r) to obtain 7r(1 . Initially, for

I =O O Vj [1, k] corresponding to zero body bias.

Solving w -phase does not pose a problem as the objectivefunction is linear in gate widths, w and the delay constraints aresecond order cones. It can therefore be solved readily as anSOCP. However, the if -phase objective is non-linear in thedecision variables. To address this issue we propose to expand theobjective function using a first order Taylor series. The if - phaseoptimization problem solved at iteration I is approximated as:

minF

s.t.F > 7FOVTo(gTw) + 7WlV7 (gTw) + w2V72 (gTw)

-fx + jX1(c)(xTYj)1 2 < T Vj C Paths

(17)

where VTO , V,1 and VT2 are the gradients computed w.r.t TFO,7Fl, and w2 respectively. The complete algorithm optim_abb ispresented in Figure 2.

4.4 Handling of intra-chip variationThe policy described above cannot account for random parametervariation. Since the structure of the policy needs to be specified atoptimization time, we need to know the number of measurementswe can make on chip to account for the intra-chip randomvariation. Assume that we can make k1 measurements and k,measurements of Vth. Assuming that we are allowed a singlechoice of A VSB:

k k,AVSB = wro+ 7Zi AL, +Z7k+iAVth,i

i=l i=l(18)

The notion of measurement complexity k = k1 + k, is used hereto represent the amount of information we are able to obtain aboutthe structure of variability. As we demonstrate in the resultssection, a higher value of k implies a lower leakage value.However, it is achieved at the cost of increased run-time anddiagnostic overhead.

Similarly, we can introduce the notion of control complexity nwhich refers to the number of body bias values that are allowed.Control complexity reflects the degree of controllability over thebody bias assignment and also the circuit overhead. It is currentlyassumed that the granularity of body bias assignment is at the

23

Figure 2. The two phase algorithm optim_abb for post siliconoptimization using ABB.block level. This is because tuning individual gates is clearly tooexpensive from the physical design perspective (extra routingoverhead, voltage conversion). Spatial clustering may also beused as the gates that are spatially proximate are more likely thebenefit from an equal body bias assignment.

5. EXPERIMENTS AND RESULTSWe are now in a position to put together the complete design-timeand post silicon co-optimization flow. We start by choosing thelevel of measurement complexity and control complexity. Thesealong with the distributional information about the uncertain dataare the inputs to the above algorithm. The algorithm optim_abbproduces a set of gate sizes and an optimal policy for selectingA VSB for the given structure of variation and the control andmeasurement complexity. When the chip is fabricated, the actualrealizations of the uncertainty are known hence the value of bodybias is determined from the policy from (18). The optimizationproblem was solved using the conic optimization packageMOSEK [27]. The experiments were run on a 32-bit, 3.7 GHz.Intel Xeon processor with 4GB of memory. The benchmarkcircuits were synthesized to a cell library that was characterizedfor a 70 nm process using Berkeley Predictive Technology Model[28]. For NMOS (PMOS) transistors, the threshold voltage is0. 10V (-0. OV). The assumed magnitude of Vth and L variabilityis ,u t= 8% and 5% respectively. The optimal solution (sizesand policy) produced by the algorithm were evaluated usingMonte Carlo analysis to estimate the expected value of leakagepower by sampling from the distribution of the uncertainparameters Vth and L.

Three measures of complexity are used to characterize theoptimality of the solution: the control complexity n whichrepresents the granularity of control, the measurementcomplexity k which refers to the granularity of the monitoring

0.30 -

0.25 -

0.20 -

0.15-

0.10-

0.05 -

-*-Design time only optimizationJoint optimization

Joint optimization

reduces delay spread

Joint optimization reduces

power limited yield lbss

0.00 -*-*----95 190

Minimimum allowed delaydetermined by power limit

105Delay (ns)

110 115 120

Delay target

Figure 3. PDFs of delay distributions produced by design-time only optimization and joint optimization. Jointoptimization is successful in tightening the distribution.

and sensing circuitry, and the parameter complexity p, defined as

the ratio- /2 tt. Thus, p is a measure of how spatiallyuncorrelated the process variable is.

Figure 3 illustrates the effectiveness of our algorithm in reducingthe spread of the circuit delay and ameliorating the problem of thedual ended squeeze on parametric yield. This is achieved byincreasing the delay of faster chips by applying RBB. Since thesechips have high leakage power consumption, our algorithmreduces power limited yield loss. From the Figure it can be seen

that the yield is improved by about 5%. Application of FBB toslow chips serves to tighten the delay distribution further. Sincethe circuit is guaranteed to meet the timing yield target even forzero FBB, applying forward body bias does not improve timingyield but increases the number of chips in the higher frequencybins.

Figure 4 compares the leakage power of the circuits obtained byemploying only design time optimization and the joint designtime and post silicon algorithm outlined in the paper. Asexpected, using post silicon optimization enables a more optimalsolution compared to design time only optimization. However as

the complexity of variability increases, the benefit of using postsilicon optimization decreases. This can be attributed to the factthat as the amount of uncorrelated variability increases, designtime optimization performs better, but to utilize the adaptabilityprovided by post silicon optimization, more measurements needto be made and more complex control system used (larger numberof individually tuned clusters of logic on a chip). Therefore,increasing measurement complexity k improves the quality ofthe solution (reduces expected value of leakage). This is alsodepicted in Figure 4. However, this comes at the cost of increasedrun-time and diagnostic overhead. This is shown in Figure 5,which indicates that the run-time of the algorithm increases as

k is increased.

Table 1 documents the results obtained across the benchmarks.All solutions were evaluated using Monte Carlo analysis. 1000samples were generated for each random parameter. The circuitswere optimized for the same delay target, which is evaluatedusing Monte Carlo. We observe that for a reasonable choice ofmeasurement complexity, using our algorithm, an average saving

24

1. set Fj =0 Vi [1,k]2. get Timing Target T

3. set D < T such that ABB(w, 7r(l-1)) is feasible.4. chose delay increment 5D5. set I = 1

6. if D < T

solve w - phase ABB(w, ir(l-1)) setting delay constraintto D.

else

print W(1 1) and 17r) as the optimal solution and stop

7. setD = D +6D

8. if D < T

solve 7 -phase A BB(w(1), 7r) setting delay constraint

to D.

else

print w(l) and 7r ) as the optimal solution and stop

9. setD = D +5D10. set I = I + I and goto step 6

Table 1: Leakage power savings obtained by the joint post-silicon and design-time optimization.

Circuit No. of Design time optimization Joint design-time and post-silicon optimizationgates E(IIeak) ( W) (k = 8 )

l l=05 |P=081E(I) (W)Leakage power Runtimep =0.5 p 0.8 E(la)(i)savings (Oo) (s)

p =0.5 p =0.8 p =0.5 p =0.8

C432 261 328 301 246 291 25.00 3.32 8C499 641 908 845 568 622 37.44 26.39 15.2C880 615 560 470 388 405 30.71 13.83 12.5C1355 685 684 603 557 595 18.57 1.33 21.1C1908 1238 1203 1167 926 1040 23.03 10.88 31C2670 2041 1706 1669 1405 1530 17.64 8.33 55C3540 2582 2718 2584 2142 2473 21.19 4.30 63C5315 3753 3801 3700 3544 3598 6.76 2.76 108C6288 2704 2918 2902 2454 2685 15.90 7.48 132

Average savings 21.8 8.73

of 20% savings in leakage power consumption can be obtainedcompared to design time only optimization. Table 1 also cites theruntimes of the algorithm. It can be seen that the run timebehavior is extremely good (about 2 minutes) even for the largestbenchmark circuit.

Finally, we explore the dependence of the quality of the solutionobtained from post silicon optimization on the measurementcomplexity and control complexity. Increasing k improves theleakage power but there is a point of diminishing returns beyondwhich the improvement is insignificant. This is depicted in Figure6.

Increasing the number of circuit clusters with individuallyadjustable threshold voltages (i.e., increasing the controlcomplexity) improves the results of optimization, Figure 7. Aswith measurement complexity, this improvement in leakagepower is achieved at a cost. A larger value for control complexityimplies greater overhead, such as in biasing circuitry and routing.

380 -

360 -

340 -

s 320-

ZZE 300-LU

280 -

260 -

240 -

Design time only optimization-i- Design time + post silicon: k=4

Design time + post silicon: k=8

180 -

160 -

140 -

120 -

U) 100-

80-

W 60-

40 -

20 -

0-

0 10 20 30 40Measurement Complexity (k)

50

Figure 5. The runtime increases as the measurementcomplexity is increased, as optimal policy depends on moremeasurements.

420 -

400 -

380 -

360 -

340 -

3 320-- 300-

280-U 260-

240 -

220 -

200 -

_mE...hhhhhE.mhhhhhhhh..-E...EEEEEEEEE.E.E.E.EEEE.E.E.E.M.E.E

0.2 0.4 0.6Parameter complexity (p)

0.8 18U '.

* p=0.50 ~~~~~~~-m-p = 0.8

l',

S g-g~~~---

0 10 20 30Measurement Complexity (k)

4 540 50

Figure 4. Comparison of design time only optimization andjoint design time and post silicon optimization. Jointoptimization always does better than design time onlyoptimization.

Figure 6. The expected value of leakage power decreases aswe increase measurement complexity but the benefits leveloff for high values of k.

25

I

380 -

360 -

- 340-

- 320-wZ-

300 -

280 -

260 - U

1 2 3 4 5 6Control Complexity (n)

Figure 7. The expected value of leakage power decreases aswe increase control complexity n. A larger n corresponds tomore allowable values of body bias.

6. CONCLUSIONIn this paper we have developed a theoretical foundation for jointdesign-time and post-silicon optimization. The problem is cast asan adjustable robust linear program and solved in acomputationally efficient way. Results indicate that the designercan greatly benefit from synergistic application of design timeand post silicon optimization techniques due to the ability of postsilicon optimization solution to tune itself to the realization ofuncertain data.

7. ACKNOWLEDGMENTSThis research was supported in part by SRC, GSRC, NSF, SUN,and Intel.

8.[1]REFERENCESVisweswariah C. et. al., "First-order incremental block-basedstatistical timing analysis," in Proc. DAC, 2004, pp. 331-336.

[2] Hongliang C. and Sapatnekar S., "Statistical timing analysisconsidering spatial correlations using a single PERT-liketraversal," in Proc. ofICCAD, 2003, pp. 621 - 625.

[3] Devgan A. and Kashyap C., "Block-based static timinganalysis with uncertainty," in Proc. of ICCAD, 2003, 9-13,pp. 604-614.

[4] Patil D. et al., "A new method for design of robust digitalcircuits," in Proc. ofISQED, 2005, pp. 676-681.

[5] Singh J. et al., "Robust gate sizing by geometricprogramming," in Proc. ofDAC, 2005, pp. 315-320.

[6] Srivastava A. et al., "Statistical optimization of leakagepower considering process variations using dual-Vth andsizing," in Proc. ofDAC, 2004, pp. 773 - 778.

[7] Tschanz J. et al., "Adaptive body bias for reducing impactsof die-to-die and within-die parameter variations onmicroprocessor frequency and leakage," ISSCC Tech. Dig.,pp. 422-423, 2002.

[8] Narendra S. et al., "Impact of using adaptive body bias tocompensate die-to-die Vt variation on within-die Vtvariation", in Proc. ofISLPED, 1999, pp. 229-232.

[9] Keshavarzi A. et al., "Effectiveness of reverse body bias forleakage control in scaled dual Vt CMOS ICs," in Proc. ofISLPED, 2001, pp. 207-212.

[10] Martin S. et al., "Combined dynamic voltage scaling andadaptive body biasing for lower power microprocessorsunder dynamic workloads," in Proc. of ICCAD, 2002, pp.721-725.

[11] Chen T. et al., "Comparison of Adaptive Body Bias (ABB)and Adaptive Supply Voltage (ASV) for improving delayand leakage under the presence of process variation," IEEETrans. VLSI on Systems, v. 11, no. 5, 2003, pp. 888-899.

[12] Wang H. et al., "Variable tapered pareto buffer design andimplementation allowing run-time configuration for low-power embedded SRAMs," IEEE Trans. on VLSI, Vol. 13,No. 10, Oct. 2005.

[13] Prekopa A., Stochastic Programming, Kluwer Academic,1995.

[14] Boyd S., Vandenberghe L., Convex Optimization,Cambridge, 2004.

[15] Ben-Tal, A., Nemirovski, A. "Robust solutions to uncertainlinear programs," OR Letters, 25, 1-13, 1999.

[16] Ben-Tal A. et al., Adjustable robust solutions of uncertainlinear programs, Mathematical Programming, Volume 99,Issue 2, Mar 2004, pp. 351 - 376.

[17] Borkar S. et al., "Parameter variation and impact on circuitsand microarchitecture," in Proc. ofDAC, 2003, pp. 338-342.

[18] Taur Y. and Ning H. T., Fundamentals of Modern VLSIDevices, Cambridge University Press, 1998.

[19] Berkelaar M. and Jess J., "Gate sizing in MOS digitalcircuits with linear programming," in Proc. ofDATE, 1990,pp. 217-221.

[20] Rao R. et al., "Statistical analysis of subthreshold leakagecurrent for VLSI circuits," IEEE Trans. on VLSI Systems,12(2), pp. 13 1-139, February 2004.

[21] Srivastava A. et al., "Accurate and efficient parametric yieldestimation considering correlated variations in leakage powerand performance," in Proc. ofDAC, 2005, pp.535-540.

[22] Azizi N. and Najm F. N. "Compensation for within-dievariations in dynamic logic by using body-bias," in Proc. ofNEWCAS, 2005, pp. 167-170.

[23] Papoulis A., Probability Random Variables and StochasticProcesses, New York: McGraw-Hill, 1984.

[24] Ben-Tal, A. Nemirovski, A., "Robust Convex Optimization,"Math. Oper. Res., 23, 1998.

[25] Mani M. and Orshansky M., "Application of fast SOCPbased statistical sizing in the microprocessor design flow," inProc. ofGLS-VLSI, 2006, pp. 372-375.

[26] Alizadeh F. and Goldfarb D., "Second-order coneprogramming", Technical Report RRR, Report number 51-2001, RUTCOR, Rutgers University.

[27][28] Cao Y. et al., "New paradigm of predictive MOSFET and

interconnect modeling for early circuit design," Proc. ofIEEE CICC, 2000, pp. 201-204.

26

Documents

Joint Design-Time and Post-Silicon Minimization ... · sizing, adaptive body biasing, and adaptive supply voltage biasing. Because joint timing-leakage optimization is ofprimary concern,