Techniques for VLSI Circuit Optimization Considering Process Variations Mahalingam Venkataraman Department of Computer Science and Engineering University

Techniques for VLSI Circuit Optimization Considering Process Variations Mahalingam Venkataraman Department of Computer Science and Engineering University of South Florida, Tampa, FL, 33620 Chair: Prof. Babu Joseph Major Professor: Prof. Nagarajan Ranganathan Committee Members: Prof. Srinivas Katkoori Prof. Hao Zheng Prof. Justin E. Harlow Prof. Kandethody Ramachandran Prof. Sanjuktha Bhanja 1 Mahalingam Venkataraman, PhD Defense Date: 3/23/2009 Slide 3 Outline of Presentation Introduction, Motivation and ContributionsVariation Aware Gate SizingVariation Aware Timing based PlacementVariation Aware Buffer Insertion and Driver SizingDynamic Clock StretchingConclusionsPublications 2 Mahalingam Venkataraman, PhD Defense Date: 3/23/2009 Slide 4 Transistor Count VLSI Circuit Complexity Technology scaling Improved multi-functional features and enhanced performance Increased failure probability Northwood 55 Mill. Prescott 125 Mill. Yonah, 151 Mill. Wolfdale 410 Mill. Yonah 151 Mill. Source: Intel 3 Slide 5 Nanometer Dimensions 1 m 10 cm 1 cm1 mm100 m10 m 100 nm 65 nm Transistor Source: Intel Source: Spektrum der Wissenschaften Courtesy: Sill, PGPEE 2008 4 Slide 6 Process Variations Process variations, in general, refer to the difference between the intended and obtained values in voltage and process parameters prior and post fabrication of the circuit. The variations are more pronounced in nanometer era due to the limitations in fabrication equipment and lithography process Process variations in nanometer era has a impact on the failure probability and hence the timing yield of integrated circuits 5 Slide 7 VLSI Circuit Optimization Circuit optimization in the nanometer era, is formally defined as the process of designing circuits with best possible power, delay and noise parameters Common methods Transistor/Gate Sizing, Wire sizing, Incremental placement Multiple supply, threshold voltages, Buffer insertion The relationship among the parameters are conflicting Circuits with optimal power can have a poor performance and/or noise value Process variations have made the relationships among the conflicting optimization objectives complex and hence more difficult to optimize 6 Slide 8 Motivation: Dissertation Research Corner based circuit optimization ignoring variation effects can negatively impact timing yield Worst case consideration of variations, guarantees good yield, but can lead to severe over design. In this context, there is a strong need for re-invention of circuit optimization techniques in a statistical perspective. The methodology has to consider multiple conflicting objectives model variation effects without assumptions regarding distributions has to be efficient enough to handle large circuits. Hence, in this dissertation, we model and develop novel statistical and runtime variation aware solutions for circuit optimization considering process variations. 7 Slide 9 Statistical Timing Analysis Element delay as PDF/CDF Element delay Min/Max Circuit delay Circuit delay as PDF/CDF Static Timing Analysis Statistical Timing Analysis Variation awareness in VLSI started with PDF/CDF propagation in timing analysis. Circuit optimization frameworks were then built on top of the SSTA engine to optimize performance considering variations. 8 Slide 10 Mathematical Programming based Circuit Optimization SSTA based iterative circuit optimization require a number of complicated operations at each node and hence incur a prohibitive runtime [Schmidt, EJOR 2000, Karkowski, ICSS 1995]. Hence, the authors in [Mani, DAC 2005, Mani ICCD 2004], proposed stochastic mathematical programming based circuit optimization. Mathematical programs are fast and has the capability to handle large circuits Several circuit optimization problems like gate sizing, buffer insertion and placement have well defined mathematical programming formulations The stochastic programming technique is reasonably fast, but can be conservative in terms of yield and hence lesser savings in area or power [Buckley, IJFSS 1990] 9 Slide 11 Fuzzy Mathematical Programming (FMP) FMP is a special case of Mathematical programming with fuzzy variables in constraints or objective functions. variations are modelled as fuzzy numbers. Similar to stochastic programming, fuzzy programming involves a relaxation step FMP has been used to model uncertainty in scheduling, binding, testing, robotics, pattern matching and artificial intelligence. A fuzzy number (linear, trapezoidal or nonlinear) is defined as a number whose precise value is somewhat uncertain. 10 Slide 12 Motivation: Fuzzy Programming The author, in [Buckley, IJFSS 1990], highlighted that fuzzy programming guarantees solutions better or at least as good as stochastic programming and proved the same using Monte-Carlo simulations. The bound constraints in fuzzy programming allows the FMP to search for the optimal value instead of averaging a list of close to optimal values as in stochastic programming. Fuzzy programming also handles variation parameter in the objective function as opposed to constraints in stochastic Hence, we planned to use fuzzy programming based modeling and solution for uncertainty aware VLSI circuit optimization. 11 Slide 13 Motivation: Dynamic Clock Stretching The proposed statistical design methods (fuzzy or stochastic) are quite effective in the presence of variations incurring reasonable overheads. However, when there are no variations occuring in critical paths, the overheads still remain. To avoid this, we investigate a completely different approach to handle process variations. A dynamic delay detection and clock stretching technique is proposed to combat the effects of process variations 12 Slide 14 Contributions of This Dissertation Major Contributions Modeling Variations in Process Parameters using Fuzzy Membership Functions Fuzzy Linear Programming Formulation for Variation Aware Gate Sizing Fuzzy and Stochastic Nonlinear Programming Formulation for Variation Aware Timing Based Placement Fuzzy Piece-Wise Linear Programming Formulation For Variation Aware Buffer Insertion and Driver Sizing at Logic and Layout Level A Process Variation Tolerant Circuit Design using Dynamic Clock Stretching 13 Slide 15 Introduction, Motivation and ContributionsVariation Aware Gate SizingVariation Aware Timing based PlacementVariation Aware Buffer Insertion and Driver SizingDynamic Clock StretchingConclusionsPublications Outline of Presentation Introduction, Motivation and Contributions Gate Sizing Previous works Fuzzy LP Formulation Fuzzy Gate Sizing Algorithm Experimental Results Variation Aware Gate Sizing Variation Aware Timing based PlacementVariation Aware Buffer Insertion and Driver SizingDynamic Clock StretchingConclusionsPublications 14 Slide 16 Variation Aware Gate Sizing (VA-GS) Gate sizing is one of the simplest, yet effective technique for improving power/performance trade- off in VLSI circuits Increasing size of a gate increases performance and power consumption. The problem of gate sizing is well suited to be formulated as a mathematical programming problem In this work, we formulate variation aware gate sizing as a fuzzy linear programming problem, maximizing timing yield with power and delay as constraints. 15 Slide 17 Previous Work: (VA-GS) Statistical Timing Analysis (SSTA) methods [Hasimoto, ISPD 2000], [Devgan, ICCAD 03], [Blaaw, DAC 04] Continuous functions propagated instead of discrete values max and add operations on continuous functions Penalty based circuit optimization [Visweswariah et.al, DAC 02] used penalty functions in constraints to avoid building a wall of timing critical paths Stochastic optimization using chance constrained programming [Mani, ICCD 04, Mani, DAC 05] models uncertainty in delay using probabilistic constraints 16 Slide 18 Previous Works 17 Slide 19 Variation Aware Gate Sizing Outline Step 1: Formulation of linear models for gate delay and dynamic power as functions of gate sizes. Step 2: Modeling process variation in gate delay coefficients by treating them as triangular fuzzy numbers. Step 3: Formulating and solving the LP for Deterministic Gate Sizing by setting the variation parameters to worst and typical case -> we get bounds for fuzzy formulation. Step 4: The bound values generated above are used to convert fuzzy formulation into a corresponding crisp formulation using symmetric relaxation. Step 5: The crisp optimization problem is then solved through a commercial nonlinear optimization solver. 18 Slide 20 Step 1: Power and Timing Models The power consumption of a gate is fitted as a linear function of the gate size (s i ) only. Linear approximation for gate delay is adopted from [Berkelaar, EDAC 90] where a, b, c : constant coefficients from spice simulations fo(i): fan-out of gate i; s i : size of gate i; The above equation describes, gate delay (d i ) as a function of gate size (s i ) and sizes of its fan-out gates 19 Slide 21 Step 2: Modeling Variations The variations in gate length and oxide thickness are translated to coefficients b and c in the delay equation The actual physical variability of these coefficients are unknown, but they closely approximate gate length and oxide thickness [Mani, ICCD 04] The fuzzy coefficients are modeled as triangular fuzzy numbers of the form (b i,b i g i, b i +g i ) and (c i,c i h i,c i +h i ) and the coefficients g i and h i represent the maximum variations 20 Slide 22 Step 3: Deterministic Gate Sizing: LP Formulation In this work, we use a delay constrained power minimization formulation for gate sizing The deterministic version of the gate sizing optimization problem can be shown as where Pi is the power consumption of gate i, Dp is the delay of path p and Tspec is the required timing specification of the circuit The variations in delay are transferred to the coefficients b and c in the delay equation 21 Slide 23 Step 3: Pre-Processing for Creating Crisp Problem The deterministic LP problem is solved with gate delay set to worst case (wc_sizing) Next, the deterministic LP problem is also solved with delay of a gate set to nominal case (nc_sizing) The solution to these optimizations represent the lower and upper bound values for variation aware fuzzy gate sizing problem 22 Slide 24 Using these bound values from the pre-processing step and a variation parameter lambda ) the fuzzy linear programming problem shown below is converted to crisp programming problem. The solution to the crisp problem is in between the bound values and represents an overall degree of satisfaction of the variation parameters and the objectives of the optimization problem. Step 4: Variation Aware Fuzzy Gate Sizing 23 Slide 25 Step 4: Crisp Nonlinear VA-GS Problem The crisp problem for VA-GS is given by, Where is the variation parameter, nc sizing and wc sizing represent the values of the objective functions from the deterministic pre-processing optimizations and varies from 0 to 1. The crisp problem maximizes the variation resistance (robustness), bounds the power value and satisfies the delay constraints in an optimal fashion 24 Slide 26 Reducing Computations in VA-GS The path based delay constraints in the gate sizing problem is converted to node based constraints This translation only introduces a sub-optimality of 1 - 2%, but increases the feasibility of optimizing large circuits since # of paths is exponential The spatial correlation values of the process variations can be handled by processing the circuit as n smaller regions Gates in a specific region are assumed to have the same range of variation values [Singh, DAC 05] 25 Slide 27 Step 5: VA-GS Simulation VA-GS was tested on ITC99 circuits AMPL mathematical programming language format. KNITRO a commercial nonlinear optimization solver. A variation of 25% in gate delay was assumed in accordance with [Nassif, ISSCC 2000]. 26 Slide 28 Experimental Results The variation aware fuzzy gate sizing approach provides an average improvement of 18% compared to DWC and 9% compared to stochastic gate sizing without compromising on timing yield. Slide 29 Spatial Correlations We use a grid based correlation model The approach divides the design into different number of regions, the gates in same/closer regions are highly correlated compared to gates in different regions A pre-processing step incorporates the correlation effects into coefficients b i and c i in accordance with the gate and fan-out gates location in the chip Slide 30 VA-GS with Spatial Correlations The spatial correlation values of the process variations were handled by clustering the circuit as n smaller regions similar to [Singh, DAC 05] The variation aware gate sizing with spatial correlations had an extra 3% improvement due to reduced pessimism in modeling variations. Slide 31 Monte-Carlo simulation The solution of the fuzzy technique is verified for timing yield values using Monte-Carlo simulation We generated 10000 copy of all benchmark circuits with random gate delay coefficients and fixed gate sizes from the solution of the fuzzy approach The delay coefficients corresponding to gate length and oxide thickness were treated as random numbers within the nominal case and worst case range. The timing yield defined as the number of times delay of the random circuit is less than Tspec value. The proposed fuzzy approach indicates a timing yield of 99% for the ITC benchmark circuits. 30 Slide 32 Introduction, Motivation and ContributionsVariation Aware Gate SizingVariation Aware Timing based PlacementVariation Aware Buffer Insertion and Driver SizingDynamic Clock StretchingConclusionsPublications Outline of Presentation Introduction, Motivation and Contributions Variation Aware Gate Sizing Timing Based Placement (TBP) Previous works Problem Formulation Variation Aware Fuzzy TBP Stochastic TBP Experimental Results Variation Aware Timing based Placement Variation Aware Buffer Insertion and Driver SizingDynamic Clock StretchingConclusionsPublications 31 Slide 33 Timing Based Placement (TBP) Incremental placement for delay improvement is a crucial step in the post layout timing convergence flow The TBP performs small changes to the cell locations, after wire length driven standard cell placement, with the objective of improving worst negative slack Previous works on timing driven placement [Choi, ICCAD 03] has shown significant improvements of (upto 20%) in worst negative slack 32 Slide 34 Variation Aware TBP The objective of timing based placement is to find optimal locations of cells in a critical sub-circuit such that the critical delay of the circuit is minimized. The timing based placement technique requires a nonlinear programming approach, as net delay has a quadratic dependence on net length We proposed two new solutions: (i) A fuzzy nonlinear program based solution (ii) A stochastic chance constrained programming based solution for variation aware timing based placement. 33 Slide 35 Previous works on TBP Timing driven placement can be categorized into net-based [Ren, ISPD 04] and path based [Chowdhary, DAC 05] approaches. The net-based approach translates the timing requirements into sensitivity coefficients of timing critical nets and performs a weighted wire length minimization. Hence, modeling the effects of process variations in these net-based approaches is not straightforward. The path based approaches hold an accurate timing view and minimize critical path delay more directly by involving path delay constraints in the optimization problem. 34 Slide 36 Previous works on TBP A problem with the path based approach is their high computational complexity due to the exponential number of paths in the circuit. But path based delay constraints can be transformed into node-based arrival time constraints [Chowdhary, DAC 05] to improve the feasibility of optimizing large circuits. The transformation only introduces a sub-optimality of 1- 2% [Mani, ICCD 04]. Hence, we model process variations in a node based timing based placement formulation to maximize yield, with delay and placement location constraints. 35 Slide 37 Taxonomy VA-TBP 36 Slide 38 Location Constraints and HPWL The variables leftx, rightx, lowery and uppery are defined for every net. For every cell at location (x,y) connected to net, following constraints are required, Half perimeter wire length (HPWL) of this net is then given by, lowery uppery Net leftx rightx 37 Slide 39 Variation Aware Fuzzy TBP Outline Step 1: Formulation of linear model for gate delay and nonlinear model for interconnect delay. Step 2: Modeling process variation in delay coefficients by treating them as triangular fuzzy numbers. Step 3: Estimate critical cells and calculate move distance. Step 4: Formulating and solving the NLP for TBP by setting the variation parameters to worst and typical case -> we get bounds for fuzzy formulation. Step 5: The bound values generated above are used to convert fuzzy formulation into a corresponding crisp formulation using symmetric relaxation. Step 6: The crisp optimization problem is then solved through a commercial nonlinear optimization solver. 38 Slide 40 Step 1: Gate and Interconnect Delay models We model gate delay as linear function of gate size (s i ) and capacitance (C pi ). In timing based placement, the gate size (s i ) does not change and only load seen by the gate changes, due to change in interconnect length. The interconnect delay is modeled as a quadratic function of the net length and can be shown as, Hence, in this work, we model timing based placement as a nonlinear programming problem to maximize timing yield with delay and location constraints 39 Slide 41 Step 2: Modeling Variations Similar to variation aware gate sizing, we model the uncertainty in delay using coefficients of the gate and interconnect delay equation. The coefficients A 1, A 2 and K D are assumed to vary and are are modelled as triangular fuzzy numbers of the form (A 1,A 1 -VA 1, A 1 +VA 1 ), (A 2,A 2 -VA 2, A 2 +VA 2 ) and (K D, K D -VK D, K D +VK D ) respectively. 40 Slide 42 Step 3: Pre-Processing The gates and interconnects in the most critical path and the adjacency graph within three levels are considered for incremental timing based placement The allowed movable distances of the critical cells are set in proportion to the criticality and the available free-space of the cluster 41 Slide 43 Step 4: Deterministic TBP Formulation The deterministic version of the incremental timing based placement problem can be shown as, The HPWL and location constraints are not shown here as they are not affected by process variations. Here, arr is the arrival time variable of gate and nets and Tspec is the required timing specification of the circuit The problem is formulated to maximize the timing specification (a pseudo for worst negative slack) with node based required arrival time constraints. 42 Slide 44 Step 4: TBP at Nominal and Worst Case Corner The deterministic TBP problem is solved with gate and net delay set to worst case values (wc_tbp) Next, the deterministic TBP problem is also solved with delay of a gate set to nominal case (nc_tbp) The solution to these optimizations represent the lower and upper bound values for variation aware incremental timing based placement problem 43 Slide 45 Using these bound values from the pre-processing step and a variation parameter lambda ) the uncertain nonlinear programming problem is converted to a crisp nonlinear problem. The problem aims to maximize variation resistance ( ) and maintains the timing specification in between the bound values ( wc_tbp and nc_tbp) Step 5: Crisp TBP Formulation 44 Slide 46 Stochastic Timing Based Placement The stochastic formulation is cast as a robust mathematical program, which captures variation effects on the constraints using the mean and variance of the uncertain parameters. The stochastic chance constrained programming technique models uncertainty in delay using probabilistic constraints. 45 Slide 47 Probabilistic Constraints The uncertain arrival time constraints modeled as probabilistic constraints: Where, ( the probability at which the constraint has to be met corresponds to the timing yield of the circuit The probabilistic constraints are relaxed to the equivalent formulation with mean, cumulative distribution and standard deviation 46 Slide 48 Stochastic TBP The resultant stochastic TBP problem can be shown as, Here, ) is the standard deviation and is the inverse cdf value of the distribution. In accordance with previous works [Prekopa, Kluwer 95], a inverse cdf value of 3 is used for timing yield of 99.7% 47 Slide 49 Step 6: VA-TBP Simulation VA-TBP was tested on ITC99 benchmark circuits KNITRO solver available through NEOS is used for both formulations described in AMPL format 48 Slide 50 Experimental Results The variation aware fuzzy placement approach provides an average improvement of 12% compared to DWC and the stochastic placement methodology provided a 10% compared to DWC Slide 51 Introduction, Motivation and ContributionsVariation Aware Gate SizingVariation Aware Timing based PlacementVariation Aware Buffer Insertion and Driver SizingDynamic Clock StretchingConclusionsPublications Introduction, Motivation and Contributions Variation Aware Gate Sizing Variation Aware Timing based Placement Buffer Insertion and Driver Sizing (BIDS) Deterministic BIDS Logic Level BIDS Logic versus Layout Level BIDS Experimental Results Variation Aware Buffer Insertion and Driver Sizing Dynamic Clock StretchingConclusionsPublications Outline of Presentation 50 Slide 52 Buffer Insertion and Driver Sizing (BIDS) Impact of interconnect driven performance optimization is increasing in the nanometer era. In prior buffer insertion techniques, wires have been divided into smaller segments and bring the wire delay to almost linear in terms of its length. It has also been pointed out in [Saxena, TCAD 04], that 35% of the total standard logic cells in a circuit will be buffers at the 65nm technology level. Further, several works have pointed out that buffer insertion coupled with driver sizing, in the optimization phase, can reduce the number of buffers inserted. 51 Slide 53 Previous Works: Buffer Insertion Buffer insertion techniques Net-based Net-based ordering mechanisms may lead to sub-optimal over-buffering due to a lack of global view. Path based The path based buffer insertion algorithms abstract a whole path as a net and often result in over buffering in the paths that are considered first. Network-based Network-based buffer insertion techniques, consider a whole circuit as input and insert buffers with a global view. 52 Slide 54 Logic Level Variation Aware BIDS We formulate the buffer insertion and driver sizing problem at the logic level as a piece-wise linear program with variations modeled as fuzzy numbers. Piece-wise linear constraints are used for modeling buffer insertion, when multiple buffers are to be inserted in a net segment A look-up table based approximation is used for net length modeling at the logic level Number of buffers and gate sizes used as pseudonym for dynamic power consumption during BIDS 53 Slide 55 Gate and Interconnect Delay models Similar to several LP based sizing works, we model gate delay as function of gate size (s i ) and downstream capacitance (C load ) The interconnect delay on the other hand can only be modeled as a quadratic function of the net length To reduce complexity, we model the BIDS problem with piece-wise linear delay constraints 54 Slide 56 Logic Level Net Length Estimation Accurate modeling of the interconnect length at the logic level is crucial to optimization at this level In this work, we estimate wire length using a fast and accurate lookup table based estimation. Previous works, have used the Rents rule to derive the upper bounds for interconnection lengths The rents rule however, does not hold true at all levels of partition hierarchy in the nanometer era Hence, we use a table based methodology with number of cells/interconnects and fan-out count of each cell as the address for look-up 55 Slide 57 Logic Level Net Length Estimation The look-up table is created with layout-level wire length results of sample benchmark circuits MCNC benchmark suite with gate complexity ranging from 500 to 10000 gates were used for estimation Interconnects with same fan-out count is grouped and the average net length for each fan-out count is calculated For each fan-out count, nets are averaged again based on gate count in the second dimension A maximum fan-out size of 20 is assumed and all nets with more than 20 fan-out count are rounded to 20 56 Slide 58 Deterministic-BIDS The equation below shows the BIDS problem formulated to minimize buffer and gate cost with piece-wise required time constraints 57 Slide 59 Modeling Variations The variations in delay are modeled as a function of the coefficients in required timing constraints The coefficients cb1, cb2 and cg1 are modeled as fuzzy numbers with worst case values (cb1-vb1), (cb2-vb2) and (cg1-vg1), where vb1, vb2 and vg1 are the maximum variation values 58 Slide 60 Pre-Processing (Deterministic Optimizations) The fuzzy mechanism starts with a set of pre-processing optimization steps with the varying parameters set to worst case and nominal case values Here, we model uncertainty due to process variations, as an imprecision in the delay improvement due to BIDS. The coefficients cg1, cb1 and cb2 are modeled as triangular fuzzy numbers The deterministic problem is solved with variations coefficients set to cg1-vg1, cb1-vb1 and cb2-vb2 (Obj wc ) The deterministic problem is solved with cg1, cb1 and cb2 set to their mean values (Obj nc ) 59 Slide 61 The Obj wc and Obj nc from the deterministic-BIDS are the worst case and nominal case objective values Now with these pre-processed objective (Obj) values and a variation resistance parameter (lambda), the fuzzy problem is converted to the following crisp problem, Conversion to Crisp Formulation 60 Slide 62 Experimental Setup The simulation flow for the fuzzy-BIDS is shown in Figure. Fuzzy-BIDS was tested on ITC 99 benchmark circuits mapped to user defined technology library AMPL mathematical programming language format KNITRO interior point nonlinear optimization solver 61 Slide 63 Experimental Results The variation aware logic level fuzzy-BIDS approach provides an average improvement of 35% on the number of buffers and gate cost required to meet performance and yield targets 62 Slide 64 Layout Level Variation Aware BIDS The variation aware buffer insertion at the layout level is formulated to optimize variation resistance with delay and cost (number of buffers and gate sizes) as constraints. The layout level buffer insertion, however, has restriction on the candidate buffer location to avoid repeating the place and route step. The generation of candidate buffer locations is performed by dividing the routed wires into channels. Sparse channels were preferred as candidates compared to denser ones. A incremental legalization step is performed after the layout level buffer insertion to remove overlaps 63 Slide 65 Layout Level BIDS Simulation The benchmark circuits for layout level BIDS were placed and routed using cadence design encounter tool to estimate actual wire lengths Similar to the logic level simulations, the layout level AMPL models were solved with KNITRO nonlinear programming (NLP) solver The AMPL models were rebuilt for layout level with worst-case, nominal-case and fuzzy modeling 64 Slide 66 Logic Level versus Layout Level BIDS The cost function (number of buffers plus gate size increments) comparing logic and layout level BIDS for various benchmarks is shown in Figure. The average difference (among all benchmarks) in buffer plus gate cost between logic and layout level simulations is within 10% 65 Slide 67 Introduction, Motivation and ContributionsVariation Aware Gate SizingVariation Aware Timing based PlacementVariation Aware Buffer Insertion and Driver SizingDynamic Clock StretchingConclusionsPublications Outline of Presentation Introduction, Motivation and Contributions Variation Aware Gate Sizing Variation Aware Timing based Placement Variation Aware Buffer Insertion and Driver Sizing Motivation Related Work Proposed Methodology Example Evaluation Experimental Results Dynamic Clock Stretching Conclusions Publications 66 Slide 68 Dynamic Clock Stretching: Motivation Statistical optimization methods (fuzzy, stochastic) have been effective in improving the yield/cost tradeoffs for circuits in the nanometer era However, statistical design methods over consume power/delay even in the absence of variations Hence, solutions which can dynamically detect delay due to variations and perform corrective/preventive action is becoming necessary Here, we propose a dynamic delay detection and clock stretching technique to prevent timing violations 67 Slide 69 The methodology uses a shadow latch to capture delayed transitions and generates error signal, which is sent to the voltage controller The technique, based on current timing failures, corrects them from happening in later cycles RAZOR: Dan Ernst et. al., MICRO 2003. Related Work: Dynamic Error Correction using Adaptive Voltage Scaling 68 Slide 70 Related Work: Critical Path Isolation Based Variation Tolerance [Ghosh, ICCAD 2003] The methodology isolates critical paths. Evaluates the data in two cycles whenever critical paths are activated. Works well on special designs with few critical paths, but incurs delay overhead on random designs. 69 Slide 71 Related Work: Other Methods Adaptive voltage scaling based on Critical path duplication [Burd, ISSCC 2000] Clock phase adjustment based on dynamic delay buffer cell [Semiao, DDECS 2008] The dynamic delay buffer and critical path duplication do not consider spatial correlation Dynamic delay buffer design considers variations in process parameters and ignores temperature and voltage variations 70 Slide 72 Basic Idea Irrespective of the variations occurring (P, V or T), we would like to investigate solutions at circuit level to combat variations with significantly less overheads. Identify and capture the delay due to process variations early in the clock period Employ a delay detection circuit to identify if a transition is delayed in the critical paths Delay the clock (or select a delayed clock) in the event that the arrival of a signal is delayed due to process variations. 71 Slide 73 Proposed Work: Dynamic Clock Stretching for Variation Tolerance 72 Slide 74 Proposed Work: Discussion An important pre-processing step would be the identification of critical locations (interconnects), halfway in the critical path In the presence (absence) of variations, the transitions have to be after (before) the negative edge of the clock The positive level triggered latch, shown in Figure captures the value floating on critical interconnect at the positive level of the clock. 73 Slide 75 Proposed Work: Discussion If the transition is delayed due to process variations, then the inputs to the XOR gate will be different. The multiplexor selects the normal (undelayed) or delayed clock for the destination flip-flop based on the value of the XOR gate output In the proposed approach, the delayed clock can be dynamically selected, in case the signal propagation is delayed in the data path due to process variations. 74 Slide 76 Proposed Work: Discussion The delay detection and clock stretching logic (CSL) is added to the critical and near critical paths that can potentially have timing failure due to process variations Unlike voltage or frequency scaling, the proposed methodology can provide immediate activation and enable prevention of timing failures Since the detection circuit monitors data transitions on critical interconnects, the methodology is independent of the type of process variation (PVT). 75 Slide 77 Example Circuit Evaluation A chain of inverters in between two flip-flops stages is chosen as the example circuit. In this circuit, all interconnects in the path switch making the net halfway in the path, the necessary critical interconnect. Next, we show the simulation snapshots for the example circuit simulation 76 Slide 78 Simulation Snapshot of Example Circuit 77 Slide 79 Simulation Snapshot of Example Circuit 78 Slide 80 Simulation Snapshot of Example Circuit 79 Slide 81 Short paths and Pipelined critical paths In the context of clock stretching, the issue of short paths and consecutively pipelined critical paths has to be addressed. In nanometer designs, short paths are usually rare due to the multiple objectives of power, performance and yield Plus, in this work, we only use a small margin for clock stretching (approximately 10%), hence minimizing the possibility of short path failures Secondly in pipeline circuits if a critical path is followed by another critical path in the following pipeline stage, the CSL methodology can cause timing failures. This is because the delayed clock circuitry reduces the data capture time available in subsequent pipeline stage. 80 Slide 82 Monte-Carlo Flow for Timing Yield Estimation The simulation flow for timing yield estimation is shown in Figure. A simple C program was developed to estimate timing yield, with place-route and timing analysis reports 81 Slide 83 Timing Yield Results on Benchmarks Number of critical paths can be reduced by incremental sizing/placement to improve CSL overhead 82 Slide 84 Clock Stretch Range Versus Timing Yield Graph showing impact of clock stretching on timing yield. It can be clearly seen that 10% is a good choice considering the objective of high timing yield and short path failures 83 Slide 85 Conclusions In this research, we have proposed solutions for improving timing yield considering variations without significant over design. The fuzzy modeling is shown to effectively model variations in linear, nonlinear and piece-wise linear circuit optimization problems. Hence, the various algorithms and circuit optimization methods proposed in this dissertation research represent significant additions to the VLSI CAD tools in the context of variation aware design. 84 Slide 86 Conclusions The proposed circuit level technique can be used to dynamically detect delay in signals that occur due to variations and stretch the clock to add the required extra slack. This method is expected to make a significant impact in the industry and a totally different approach from the previous works. 85 Slide 87 Acknowledgements Semiconductor Research Corporation contract 2007-HJ-1596 NSF Computing Research Infrastructure grant CNS-0551621 86 Slide 88 Publications 1.V. Mahalingam, N. Ranganathan and J.E. Harlow, Fuzzy Optimization Approach for Gate Sizing in the presence of Process Variations, IEEE Transactions on VLSI Systems, 16(8), Pages 975-984, Aug 2008 2.V. Mahalingam and N. Ranganathan, Timing Based Placement Considering Uncertainty due to Process Variations, Accepted for Publication (Feb 2009) in IEEE Transactions on VLSI Systems 3.V. Mahalingam and N. Ranganathan, Improving Accuracy in Mitchells Logarithmic Multiplication using Operand Decomposition, IEEE Transactions on Computers, 55(12), Pages 1523-1535, Dec 2006 4.V. Mahalingam, K. Bhattacharya, N. Ranganathan, H. Chakravarthula, R. Murphy and K. Pratt,An Efficient VLSI Architecture for Accurate Computation of Lucas-Kanade based Optical Flow, Accepted for Publication (Sep 2008) in IEEE Transactions on VLSI Systems 5.N. Ranganathan, U. Gupta and V. Mahalingam, Simultaneous Optimization of Total Power, Crosstalk Noise, and Delay Under Uncertainty, Great lakes symposium in VLSI (GLSVLSI), Pages 171-176, May 2008 6.V. Mahalingam and N. Ranganathan, A Fuzzy Optimization Approach for Process Variation Aware Buffer Insertion and Driver Sizing, IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Pages 329-334, Apr 2008 7.V. Mahalingam and N. Ranganathan, Variation Aware Timing based Placement using Fuzzy Programming, IEEE International Symposium on Quality Electronic Design (ISQED), Pages 327-332, Mar 2007 8.V. Mahalingam, N. Ranganathan and Justin E. Harlow, A Novel Approach for Variation Aware Power Minimization during Gate Sizing, IEEE International Symposium on Low Power Electronic Design (ISLPED), Pages 174-179, Oct 2006 9.V. Mahalingam and N. Ranganathan, Variation Aware Circuit-Wise Buffer Insertion and Driver Sizing at the Logic Level, Submitted to Design Automation Conference (DAC), 2009 10.V. Mahalingam, N. Ranganathan, N. Ahmed and H. Towfique, A Variation Aware Circuit Design using Dynamic Clock Stretching, Submitted to IEEE International Symposium on Low Power Electronic Design (ISLPED), 2009 87

Documents

Techniques for VLSI Circuit Optimization Considering Process Variations Mahalingam Venkataraman Department of Computer Science and Engineering University