Sizing of Dual-VT Gates for Sub-VT Circuits · Sizing of Dual-V T Gates for Sub-V T Circuits Babak Mohammadi, S.M. Yasser Sherazi, and Joachim Neves Rodrigues Electrical and Information

Sizing of Dual-VT Gates for Sub-VT CircuitsBabak Mohammadi, S.M. Yasser Sherazi, and Joachim Neves Rodrigues

Electrical and Information Technology, Lund University, 22100 Lund, Sweden{babak.mohammadi, yasser.sherazi, joachim.rodrigues}@eit.lth.se

Abstract—This paper presents a novel method to improve theperformance of sub-threshold (sub-VT) gates in 65-nm CMOStechnology. Faster transistors with a lower threshold voltage areintroduced in the weaker network of a gate. It is shown thatthe employed method significantly enhances the reliability andperformance of the gate, with an additive advantage of a lowerarea cost compared to traditional transistor sizing. ExtensiveMonte-Carlo simulations are carried out to verify the proposedoptimization technique. The simulation results predict that theNAND3 and NOR3 testbench shows a 98% higher noise margin.Furthermore, the inverter and NAND3 gates show an speedimprovement of 48% and 97%, respectively.

I. INTRODUCTION

Scaling down the supply voltage (VDD) to the sub-threshold(sub-VT) regime is well known as an effective method for en-ergy reduction [1]. Unfortunately, the exponential dependenceof the sub-VT currents on process parameters like thresholdvoltage (VT), makes the transistor performance and function-ality extremely vulnerable to process variations [2]. Thus, thetransistor’s performance in terms of delay and reliability isconsiderably degraded compared to super-VT operation [3].This reduces the maxima of the attainable throughput andadds extra energy overhead to the design. Sub-VT optimizeddesigns are often realized by full-custom cells (FCL) [4], [5].In this case, the impact of process variations is combated bytransistor up-sizing. Transistor sizing improves the timing, i.e.,equalized rise/fall time, and increased noise margins at the costof higher area and energy [5].

Modern sub-micron CMOS technologies are offered withdifferent threshold options which gives designers the oppor-tunity to address the leakage energy by employing high-VT

gates, whereas performance is improved by using low-VT

devices [6]. However, this method is mainly employed ongate level. The advantages of using different threshold optionson schematic level, i.e., inside gates is not well explored inliterature.

Contribution: In this work the performance and reliabilitydegradation of gates operated in the sub-VT regime is ad-dressed. To speed-up the performance bottlenecks in gatesand balance the driving strength of pull-up and pull-downnetworks (PUN and PDN), selected transistors are replaced bytheir lower-VT equivalent. This method is referred as dual-VT

(DVT) in this study. The performance gain of the proposedoptimization techniques is analyzed by means of extensiveMonte-Carlo (MC) simulations of an inverter, a NAND3, andNOR3 gate.

0.2 0.4 0.60

1

2

3

4

I ON

−N

MO

S/I

ON

−P

MO

S

VDD

[V]

1 Wp

2 Wp

5 Wp

10 Wp

Fig. 1. The ratio of active currents of HVT-NMOS and HVT-PMOS insub-VT. VT of transistors ∼700 mV. WP is the min size allowed in thetechnology. The NMOS transistor has the minimum width.

The remaining of the paper is structured as follows: Sec. II,describes the theory behind the method employed. In Sec. III,the Dual-VT approach is presented for basic combinationalgates and elaborated by means of MC simulations. Finally,conclusions are drawn in Sec. IV.

II. THEORY

CMOS processes are designed and optimized for super-VT operation. Consequently, all the optimization techniquesin super-VT need to be carefully analyzed for their efficiencyin the sub-VT domain. The transistor strength balancing isone of these techniques which has an important effect ondesign’s performance and reliability. The driving balance ofa circuit depends on different process parameters, i.e., theprimary process parameter VT and secondary parameters draininduced barrier lowering (DIBL) and subthreshold slope. Thetraditional method to equalize the imbalance is transistorsizing. This is done by a relatively low size-ratios of PMOSand NMOS in the super-VT regime. However, the transistorsize-ratios become very large in the sub-VT domain, seeFig. 1. The peak current ratio between PMOS and NMOS isfound in the sub-VT regime. Furthermore, it is observed thatby upsizing the PMOS transistor by 10×, a strength balancingis still not achieved. The imbalance between the PUN and PDNworsens even more when transistor stacking is required to forma complex gate (architectural dependency), as this reduces thecurrent driving capability. In this case the transistor size ratiosreach impractical large values.

The balanced strength improves the gate’s stability androbustness, as the switching threshold voltage (Vm) moves to

its ideal value (VDD/2) and increases the noise-margins (NM ).Unbalanced switching threshold and low NMs are among themain sources of functionality and stability failures in sub-VT

regime. Therefore, designing the gates with maximum possibleNM (NML = NMH ) is of vital importance.

In this study, the use of different threshold options in thePUN and PDN as an alternative to traditional transistor sizingis investigated. The lower-VT transistors are introduced in theslowest network, either PUN or PDN, to improve their drivingcapability. The best VT option for PUN and PDN transistorsdepends on VDD and the architecture of the gate. However,finding the best VT option for each VDD and architecture is outof the scope of this study. The effectiveness of the proposedmethod is demonstrated on 3 logic gates; an inverter, a NAND,and a NOR gate at 300 mV. To consider more extreme caseswith stacked transistors in PUN or PDN, 3 input NANDand NOR gates are used in simulations, since they contain3 stacked transistors in PDN and PUN, respectively.

III. RESULTS

The driving imbalance has a strong exponential dependencyon mismatch of process parameters (VT, slope factor andDIBL coefficient) in PMOS and NMOS transistors, where thetransistor width has a linear relation in sub-VT regime [7].Consequently, small differences in exponential process param-eters require large changes in the linear current coefficient, i.e.,device dimensions to compensate for the changes in exponen-tial process parameter mismatch to compensate. Therefore, thismethod has a large area penalty.

Figure 2(a) shows the NMOS and PMOS sizes for an idealVm in inverters implemented with different threshold optionsat 300 mV. The process used in this study has 3 differentthreshold options which are 1) high-VT (HVT) with a VT

of ∼ 700 mV, 2) standard-VT (SVT) with a VT of ∼ 560 mVand 3) low-VT (LVT) with a VT of ∼ 450 mV. It is observedthat in the LVT and HVT inverters, the required minimumtransistor width is ∼ 8× the Wmin (minimum allowed widthof transistor in the technology), whereas for the SVT inverterthe ratio is ∼13×.

The voltage transfer characteristic (VTC) of a balancedinverter based on the DVT method is shown in Fig 2(b). It isobserved that Vm of pure LVT and SVT inverters is 127 mV,however, by replacing the PMOS transistor in the SVT inverterby a LVT transistor, an ideal Vm (VDD/2) is achieved. Toget the VTC with pure HVT, SVT and LVT transistors, thePMOS transistors need to be upsized by 6.8×, 12.4× and7.7×, respectively, while keeping the NMOS at minimum size.Thus, the area cost of this proposed technique is much lower,compared to the traditional transistor sizing.

A commonly practised functionality metric in the staticlogic is static noise margin (SNM) measurement. This metricis mainly used in SRAM stability analysis, however, it isshown in [8] that the SNM of two back to back gates isequal to maximum noise that can be applied to a long chainof the same gates. Fig. 2(c) shows the benchmark used forSNM analysis. The reason for the selection of NAND and

NOR is because they give the worst input low voltage (VIL)and input high voltage (VIH ). The input number of NANDand NOR is selected to be 3 to consider worst case outputswing, as each gate has 3 stacked transistors which reduces thedriving capability. The DVT approach for NAND3 and NOR3for 300 mV and the process used in this study is shown inFig. 2(c). The best balanced Vm match for NAND3 is SVTtransistors for PUN and LVT transistors for the PDN. The bestbalanced Vm match for NOR3 gate is LVT for the PUN andHVT for PDN. The transistor sizes of DVT-NAND3 and DVT-NOR3 is the same in PUN and PDN. However, the NOR3 gatein a standard-cell library (SCL), employs 85% and 25% widertransistors in the PUN and PDN, respectively.

To consider the local variations, 1000 point Monte-carlosimulations at 300 mV and 27 ◦C are performed. The samesetting and simulations is performed for the gates with NAND3and NOR3 gates in SCL. The simulation results are shown inFig. 3. By comparing the butterfly curves, it is observed thatthe DVT approach, despite of having narrower transistors inthe NOR3 gate, has higher symmetrical curves and the SNMwindows are larger. By comparing the SNM distributions inFig 3, it is concluded that the DVT approach, by having a meanSNM voltage of 105 mV, is ∼ 47% better than single thresholdgates. Furthermore, the SNM variation is ∼ 82% lower thanother cases, and worst-case SNM is found at 90 mV, which inaverage is 96% lower than other single-VT approaches.

The same gates in the SNM testbench, i.e. NAND3 andNOR3 are used for timing analyses. Fig. 4 represents thedelay distributions of a NAND3 gate with different thresholdoptions. All the inputs in the benchmark toggled simultane-ously. As expected, the mean fall delay in HVT-NAND3 gateis 35× higher than its rise delay and worst-case rise delayis 93× longer. But as shown in Fig. 4(d), the mean delay ofrise and fall delays are almost equal. The rise delay of theSVT and LVT gates are shorter than the DVT gate, but sincetheir their fall delay is equal to or higher than DVT, there isno overall performance gain. It is observed that by employingDVT, the performance boosts to LVT gate levels, while thestatic energy dissipation remains low in the HVT levels. Thesimilar behaviour is observed for NOR3 gates.

IV. CONCLUSION

In this study, it is shown that a dual-VT approach inschematic level results in a higher both performance andreliability. The SNM of the NAND3 and NOR3 gates show animprovement of 47% over the same setting with SCL gates.The overall performance gain of a DVT-inverter and DVT-NAND3 gates is 45 % and 67 %, respectively, compared to thegates in the SCL. Furthermore, the MC simulations confirma lower worst-case delay and noise-margins. Additionally, theproposed technique is highly area efficient.

ACKNOWLEDGMENT

This work was kindly supported by the Swedish Veten-skapsradet (621-2011-4540), and Swedish VINNOVA Indus-trial Excellence Centre (SOS).

REFERENCES

[1] A. Wang et al., “Optimal supply and threshold scaling for subthresholdcmos circuits,” in IEEE ISVLSI, 2002, pp. 5 –9.

[2] A. Tajalli et al., “Design trade-offs in ultra-low-power digital nanoscalecmos,” IEEE Transactions on Circuits and Systems I: Regular Papers,vol. 58, no. 9, pp. 2189 –2200, sept. 2011.

[3] P. Friedberg et al., “Modeling within-die spatial correlation effects forprocess-design co-optimization,” in ISQED, march 2005, pp. 516 – 521.

[4] P. Meinerzhagen et al.[5] S. Luetkemeier et al., “A 200 mV 32b subthreshold processor with

adaptive supply voltage control,” in IEEE ISSCC Papers, feb. 2012, pp.484 –486.

[6] D. Bol et al., “A 25 MHz 7 uW/MHz ultra-low-voltage microcontrollerSoC in 65nm LP/GP CMOS for low-carbon wireless sensor nodes,” inIEEE ISSCC, feb. 2012, pp. 490 –492.

[7] J. Kwong et al., “Variation-driven device sizing for minimum energy sub-threshold circuits,” in Proceedings of ISLPED, oct. 2006, pp. 8 –13.

[8] J. Lohstroh et al., “Worst-case static noise margin criteria for logic circuitsand their mathematical equivalence,” IEEE JSCC, vol. 18, no. 6, pp. 803–807, dec. 1983.

1 1.5 2 2.5 3 3.5 4

2

4

6

8

10

12

14

16

Normalized NMOS width

Norm

alized P

MO

S w

idth

LVTHVT

SVT

(a)

0 0.1 0.2 0.30

0.05

0.1

0.15

0.2

0.25

0.3

VO

UT [V

]

VIN

[V]

SVT

LVT Dual−VT

150 mV127 mV

(b)

VN

VN

LVT

LVT

LVT

HVTX3LVT

LVT

LVT

SVTX3

(c)

Fig. 2. a) Required NMOS and PMOS widths for having ideal Vm in the HVT, SVT and LVT inverters at 300 mV, b) Voltage transfer curves (VTC) ofSVT, HVT and DVT inverters, c) 3 input DVT-NAND and DVT-NOR gates in the benchmark used for static noise margin (SNM) extraction.

0 0.1 0.2 0.30

0.05

0.1

0.15

0.2

0.25

0.3

VIN

VO

UT

0 0.02 0.04 0.06 0.08 0.10

50

100

150

SNM distribution[V]

Occurr

ences

µ:66.3 [mV]

σ :7.87 [mV]

σ / µ :11.9 %

(a)

0 0.1 0.2 0.30

0.05

0.1

0.15

0.2

0.25

0.3

VIN

VO

UT

0 0.02 0.04 0.06 0.08 0.10

50

100

150

SNM distribution[V]O

ccurr

ences

µ:69.7 [mV]

σ :8.08 [mV]

σ / µ :11.6 %

(b)

0 0.1 0.2 0.30

0.05

0.1

0.15

0.2

0.25

0.3

VIN

VO

UT

0 0.02 0.04 0.06 0.08 0.10

20

40

60

80

100

120

SNM distribution[V]

Occurr

ences

µ:79 [mV]

σ :8.49 [mV]

σ / µ :10.7 %

(c)

0 0.1 0.2 0.30

0.05

0.1

0.15

0.2

0.25

0.3

VIN

VO

UT

0 0.02 0.04 0.06 0.08 0.10

50

100

150

200

250

300

SNM distribution[V]

Occurr

ences

µ:105 [mV]

σ :2.19 [mV]

σ / µ :2.08 %

(d)

Fig. 3. Butter-fly curves and SNM distribution of a) LVT, b) SVT, c) HVT, d) DVT inverters from 1000 point Monte-carlo simulation for local variationsin typical-typical (TT) corner at 300 mV and 27 ◦C.

0 5 10 15 20 250

50

100

150

Normalized delay

Occurr

ences

Rise delay

µ:6.57 [ns]

σ:1.11 [ns]

Fall delay

µ:62.3 [ns]

σ:16.8 [ns]W.C.

(a)

0 10 20 30 400

50

100

150

200

Normalized delay

Occu

rre

nce

s

Rise delay

µ:11.2 [ns]

σ:1.9 [ns]Fall delay

µ:0.201 [µ s]

σ:54.3 [ns]

W.C.

(b)

0 20 40 60 80 1000

100

200

300

Normalized delay

Occurr

ences

Rise delay

µ:66.2 [ns]

σ:37 [ns]

Fall delay

µ:2.38 [µ s]

σ:0.744 [µ s]

W.C.

(c)

0 2 4 60

100

200

300

Normalized delay

Occurr

ences

Rise delay

µ:66.4 [ns]

σ:35.9 [ns]

Fall delay

µ:66 [ns]

σ:16.9 [ns] W.C.

(d)

Fig. 4. Delay variations of NAND3 with different configurations at 300 mV and 27 ◦C for 1000 point MC simulations in TT process corner. All inputs aretoggling simultaneously. The fan-out at the output of all cases is 4. The down-pointing arrows show the worst-case delays. a) LVT, b) SVT, c) HVT, d) DVT

Documents

Sizing of Dual-VT Gates for Sub-VT Circuits · Sizing of Dual-V T Gates for Sub-V T Circuits Babak Mohammadi, S.M. Yasser Sherazi, and Joachim Neves Rodrigues Electrical and Information