Text of Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline...
PowerPoint PresentationLecture 10: Circuit Families 10: Circuit Families 10: Circuit Families I = C dV/dt -> tpd (C/I) DV low capacitance high current small swing pMOS are the enemy! Can we take the pMOS capacitance off the input? Various circuit families try to do this… 10: Circuit Families * * * 10: Circuit Families Instead, use pull-up transistor that is always ON In CMOS, use a pMOS that is always ON Ratio issue 10: Circuit Families Pseudo-nMOS Pseudo-NMOS VTC Pseudo-nMOS Design 10: Circuit Families to compare with unit inverter. pMOS fights nMOS Iout = 4I/3 – I/3 10: Circuit Families 10: Circuit Families to compare with unit inverter. pMOS fights nMOS 10: Circuit Families 10: Circuit Families Pseudo-nMOS Design Ex: Design a k-input AND gate using pseudo-nMOS. Estimate the delay driving a fanout of H G = 1 * 8/9 = 8/9 N = 2 10: Circuit Families Called static power P = IDDVDD A few mA / gate * 1M gates would be a problem Explains why nMOS went extinct Use pseudo-nMOS sparingly for wide NORs Turn off pMOS when not in use 10: Circuit Families 10: Circuit Families The chip contains a 32 word x 48 bit ROM Uses pseudo-nMOS decoder and bitline pullups On average, one wordline and 24 bitlines are high Find static power drawn by the ROM Ion-p = 36 mA, VDD = 1.0 V Solution: Pseudo-NMOS Design Pseudo-nMOS gates will not operate correctly if VOL>VIL of the driven gate. This is most likely in the SF corner. Conservative design requires extra weak pMOS. Another choice is to use replica biasing. Idea comes from analog design. Replica biasing allows 1/3 the current ratio rather than the conservative ¼ ratio of earlier. 10: Circuit Families Replica Biasing Ganged CMOS Ganged CMOS * Improved Loads * EE141 * * DCVSL Transient Response 10: Circuit Families * Example: AND Gate 10: Circuit Families NMOS-Only Logic * * • Restorer adds capacitance, takes away pull down current at X • Ratio problem Restorer Sizing 10: Circuit Families Use weak pMOS feedback to pull fully high Ratio constraint * 10: Circuit Families Avoids need for ratioed feedback Optional cross-coupling for rail-to-rail swing 10: Circuit Families Alternative CPL Transmission Gate * 10: Circuit Families CMOS + Transmission Gates: * Transmission Gate XOR 10: Circuit Families 10: Circuit Families * * EE141 10: Circuit Families EEPL (Energy Economized Pass Transistor Logic) PPL (Push-Pull Pass Transistor Logic) SRPL (Swing Restored Pass Transistor Logic) DCVSPG (Differential Cascode Voltage Switch with Pass Gate Logic) 10: Circuit Families 10: Circuit Families Benefits over static CMOS were small or negative No longer generally used However, pass transistors still have a niche in special circuits such as memories where they offer small size and the threshold drops can be managed 10: Circuit Families Single Clock 2-Phase System Shift Register Shift Register When f = 1, data move through the first transmission gate to the inverter. 10: Circuit Families Charge Leakage Charge Leakage Charge Leakage 10: Circuit Families Charge Sharing Charge Sharing * Dynamic CMOS In static circuits at every point in time (except when switching) the output is connected to either GND or VDD via a low resistance path. fan-in of n requires 2n (n N-type + n P-type) devices Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes. requires on n + 2 (n+1 N-type + 1 P-type) transistors EE141 * * CMOS VLSI Design * Conditions on Output Once the output of a dynamic gate is discharged, it cannot be charged again until the next precharge operation. Inputs to the gate can make at most one transition during evaluation. Output can be in the high impedance state during and after evaluation (PDN off), state is stored on CL EE141 * This behavior is fundamentally different than the static counterpart that always has a low resistance path between the output and one of the power rails. CMOS VLSI Design * Logic function is implemented by the PDN only number of transistors is N + 2 (versus 2N for static complementary CMOS) Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed - sizing of the devices does not affect the logic levels Faster switching speeds reduced load capacitance due to lower input capacitance (Cin) reduced load capacitance due to smaller output loading (Cout) no Isc, so all the current provided by PDN goes into discharging CL EE141 CMOS VLSI Design * Overall power dissipation usually higher than static CMOS no static current path ever exists between VDD and GND (including Psc) no glitching extra load on Clk PDN starts to work as soon as the input signals exceed VTn, so VM, VIH and VIL equal to VTn low noise margin (NML) Needs a precharge/evaluate clock 10: Circuit Families Two modes: precharge and evaluate 10: Circuit Families 10: Circuit Families Use series evaluation transistor to prevent fight. 10: Circuit Families 10: Circuit Families * Clk Clk Out A Mp Me CL EE141 * leakage sources are reverse-biased diode and the sub-threshold leakage of the NMOS pulldown device. Charge stored on CL will leak away with time (input in low state during evaluation) Requires a minimum clock rate - so not good for low performance products such as watches (or when have conditional clocks) PMOS precharge device also contributes some leakage due to reverse bias diode and subthreshold conduction that, to some extent, offsets the leakage due to the pull down paths. CMOS VLSI Design * Keeper CL EE141 * During precharge, Out is VDD and inverter out is GND, so keeper is on During evaluation if PDN is off, the keeper compensates for drained charge due to leakage. If PDN is on, there is a fight between the PDN and the PUN - circuit is ratioed so PDN wins, eventually Note Psc during switching period when PDN and keeper are both on simultaneously CMOS VLSI Design * Clk Clk A Out Mp Me Charge stored originally on CL is redistributed (shared) over CL and CA leading to reduced robustness CL CA CB EE141 CMOS VLSI Design * Out = A xor B xor C What is the worst case change in voltage on node Out - assume all inputs are low during precharge and all internal capacitances are initially 0V Worst case is obtained by exposing the maximum amount of internal capacitance to the output node during evaluation. This happens when !A B C or A !B C 30/(30+50) * 2.5 V = 0.94 V so the output drops to 2.5 - 0.94 = 1.56 V CMOS VLSI Design * * Clk Clk Me Mp A B Out Mkp Clk Precharge internal nodes using a clock-driven transistor (at the cost of increased area and power) EE141 * Clk Clk * Due to capacitive backgate coupling between the internal and output node of the static gate and the output of the dynamic gate, Out1 voltage reduces CMOS VLSI Design * Sheet: * Clk Clk B A Out Mp Me Coupling between Out and Clk input of the precharge device due to the gate to drain capacitance. So voltage of Out can rise above VDD. The fast rising (and falling edges) of the clock couple to Out. CL EE141 * Danger is that signal levels can rise enough above VDD that the normally reverse-biased junction diodes become forward-biased causing electrons to be injected into the substrate. CMOS VLSI Design * * * Clk In Out1 Out2 V VTn EE141 * Out2 should remain at VDD since Out1 transitions to 0 during evaluation. However, since there is a finite propagation delay for the input to discharge Out1 to GND, the second output also starts to discharge. The second dynamic inverter turns off (PDN) when Out1 reaches VTn. Setting all inputs of the second gate to 0 during precharge will fix it. Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -> 1 transition during the evaluation period CMOS VLSI Design 10: Circuit Families 0 -> 0 0 -> 1 1 -> 1 10: Circuit Families Illegal for one dynamic gate to drive another! 10: Circuit Families * * Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period. Hence, the only possible transition during evaluation is 0 -> 1 CMOS VLSI Design 10: Circuit Families Produces monotonic outputs 10: Circuit Families 10: Circuit Families Domino Optimizations Each domino gate triggers next one, like a string of dominos toppling over Gates evaluate sequentially but precharge in parallel Thus evaluation is more critical than precharge HI-skewed static stages can perform logic 10: Circuit Families 10: Circuit Families AND, OR but not NAND, NOR, or XOR Dual-rail domino solves this problem Takes true and complementary inputs Produces true and complementary outputs sig_h sig_l Meaning 0 0 Precharged 0 1 ‘0’ 1 0 ‘1’ 1 1 invalid 10: Circuit Families 10: Circuit Families 10: Circuit Families 10: Circuit Families np-CMOS NORA Logic NP Domino Zipper CMOS The NP-Domino or NORA logic is very susceptible to noise and leakage. Zipper Domino has the same structure, but the precharge transistors are left slightly ON during evaluation. 10: Circuit Families 10: Circuit Families Transistors are leaky (IOFF 0) Dynamic value will leak away over time Formerly miliseconds, now nanoseconds Must be weak enough not to fight evaluation 10: Circuit Families 10: Circuit Families 10: Circuit Families 10: Circuit Families Typically need to precharge every other node Big load capacitance CY helps as well 10: Circuit Families 10: Circuit Families Inputs: VIH Vtn Noise sources Capacitive crosstalk Charge sharing 10: Circuit Families Output evaluates and precharges Output rises and falls on half the cycles Clocked transistors have a = 1 For a 4 input NAND, aCMOS = 3/16, aDynamic = 1/4 Leads to very high power consumption However, glitching does not occur in dynamic logic. The load capacitances are lower. 10: Circuit Families Completion Detection Keepers 10: Circuit Families Conventional Keeper Weak Keepers Differential Keeper Burn-in Conditional Keeper 10: Circuit Families Adaptive Keeper Leakage Current Replica Keeper Footed and Footless Domino 8-input Domino AND 10: Circuit Families 8-input Domino AND 10: Circuit Families MODL It is often necessary to compute multiple functions where one is a subfunction of the other or shares a subfunction. One very typical example is the carry in addition: 10: Circuit Families MODL Carry Chains 10: Circuit Families MODL 10: Circuit Families 10: Circuit Families 1.3 – 2x faster than static CMOS But many challenges: Monotonicity, leakage, charge sharing, noise Widely used in high-performance microprocessors in 1990s when speed was king Largely displaced by static CMOS now that power is the limiter Still used in memories for area efficiency 10: Circuit Families 2-input MUX Which Logic Style? Ease of design Which Logic Style? 10: Circuit Families Circuit Pitfalls Threshold drops Ratio failures Charge sharing Circuit Pitfalls Threshold Drops Ratio Failures Power Supply Noise 10: Circuit Families Hot Spots Caused by nonuniform power dissipation even when the overall power consumption is within budget. Causes variation in delay between gates. Full-chip temperature simulation is required. 10: Circuit Families Minority Carrier Injection 10: Circuit Families Minority Carrier Injection Sometimes, a node voltage can momentarily exceed power supply voltages. Then, the drain-body junction becomes forward biased. Noise tools can identify potential problems. 10: Circuit Families Diffusion Input Noise Sensitivity Diffusion Input Noise Sensitivity Standard cell latches should be built with buffered inputs. In data paths, one can still utilize exposed diffusion inputs since one knows the structure. 10: Circuit Families Domino Noise Budgets Domino Noise Budgets 10: Circuit Families Silicon-on-Insulator Circuit Design Potential for higher performance and lower power consumption. Higher manufacturing cost and more complicated circuit design due to unusual transistor behavior. There is no bulk, but insulator. Body is floating, thus changes in Vt. 10: Circuit Families SOI Inverter Cross Section SOI Process Electron Micrograph SOI Circuit Design Partially Depleted (PD) Fully Depleted (FD) In FD SOI, the body is thinner than the channel depletion width, so the body charge is fixed. Thus, the body voltage does not change. In PD SOI, the body is thicker and its voltage can vary depending on how much charge is present. This varying body voltage changes Vt. FD SOI is difficult to manufacture. 10: Circuit Families 10: Circuit Families Charge Paths There are two paths through which charge can build up in the body: Reverse biased drain-to-body (Ddb) and possibly source-to-body (Dsb) junctions. High-energy carriers causing impact ionization, creating electron-hole pairs. Some electrons are injected into the gate or gate oxide, leaving holes behind. The charge can exit the body through two paths: As body voltage increases, Dsb becomes slightly forward biased. Eventually, this cancels the first mechanism above. A rising gate or drain voltage capacitively couples the body voltage upward, too. This strongly forward biases Dsb junction and charge spills out. 10: Circuit Families SOI Advantages Potential for lower threshold voltages. Vt is dependent on channel length for bulk CMOS. Thus, worst case conditions are selected in determining Vt. In SOI, variations are smaller, thus smaller Vt can be chosen. Lower n, hence better subthreshold slope. n decreases from 1.5 to about 1.2. SOI is immune to latchup. 10: Circuit Families SOI Disadvantages 8% variation in gate delay. Can be a problem for sensitive analog circuits. Presence of a parasitic bipolar transistor. If the source and drain are held high for an extended period of time while the gate is low, the base will float high due to leakage. If the source is pulled low, the npn turns ON, creating a pulse of current. This is sometimes called pass-gate leakage. Self-heating => oxide is an insulator for heat as well. 10: Circuit Families Parasitic BJT in SOI Implications for Circuit Styles Lower delay, lower power consumption. Standard CMOS design suffers slightly from history effect. Dynamic circuits suffer from pass-gate leakage. Many precautions must be taken. Analog circuits suffer from threshold mismatches. 10: Circuit Families Subthreshold Circuit Design As discussed earlier, the minimum energy point is at a region where VDD < Vt. Typically, around 300 mV. Frequency is in the high kHz or low MHz region. Vt variations are very important, use large transistors where possible. Use standard CMOS, but avoid complex gates. Not more complex than NAND3. Due to variations, ON current in one branch may be smaller than OFF current in the series stack. 10: Circuit Families Pitfalls and Fallacies Failing to plan for advances in technology Comparing a well-tuned new circuit to a poor example of engineering practice Ignoring driver resistance when characterizing pass-transistor circuits. Reporting only part of the delay of a circuit Making outrageous claims about performance Building circuits without adequate verification tools. Sizing subthreshold circuits for speed 10: Circuit Families Historical Perspective Dynamic gates were proposed in early 1970’s. Even with CMOS, domino gates were still used for area and power advantages, for example in BELLMAC-32A from Bell Labs. The world’s first 32-bit microprocessor 10: Circuit Families Historical Perspective By the time of Alpha 21264, leakage had become so important that keepers had to be used. 1996, superscalar, out-of-order execution 180 nm Pentium 4 used self-resetting domino. 90 nm Pentium 4 used extraordinarily complex LVS logic. Custom design of 6.8M transistors. Japanese engineers favored pass transistor logic all through 1990’s. IBM has always relied on static CMOS. Hundreds of logic families in academic literature, but very few have found application in industry. 10: Circuit Families violates monotonicity during evaluation Y A X f Y f Precharge Evaluate X Precharge Goal: to reduce the number of devices over complementary CMOS V DD V SS PDN In 1 In 2 In 3 F R L Load Resistive