x High Speed

Embed Size (px)

Citation preview

  • 8/9/2019 x High Speed

    1/9

    62 IE EE J OURNAL OF S OLID-S rATE CIRCUITS , VO L. 24, NO 1, FEBRUARY 1989

    High-Speed CMOS Circuit TechniqueJIREN YUAN AND CHRISTER SVENSSON

    Abstract We have demonstrated that clock frequencies in excess of200 MHz are feasible in a 3- ~m CMOS process. This is obtained bymeans of clocking strategy, device sizing, and logic style selection. We usea precharge technique with a true single-phase clock, which remarkablyincreases the clock frequency and reduces the skew problems. Devicesizing with the help of an optimizing program improves circuit speed by afactor of 1.5-1.8. We minimize the logic depth to one instead of two ormore and use pipeline structures wherever possible. The presentationincludes experimental demonstrations of several circuits which work atclock frequencies of 200-230 MHz. SPICE simulation shows that somecircuits possibly work up to 400-500 MHz.

    I. INTRODUCTION

    M ANY factors control the possible speed of CMOSintegrated circuits. There are, for example, devicedimensions, logic circuit style, clocking strategy, architec-ture, clock distribution, etc. To pursue high speed andintegration density the dimensions of MOS transistors arescaled down continuously. The delay in a CMOS circuitwill be inversely proportional to the scaling factor a if alldimensions are reduced without changing physics [1].However, there are physical, geometrical [2], [3], and alsocost limits on scaling down transistors. Therefore, weshould tap the potential of the most popular technique. Infact, we have been investigating the possibilities of increas-ing speed by combining different circuit techniques in anavailable and relatively low-cost process, for example, the3-pm CMOS process. In the present work we will limitourselves to a discussion of a high-clock-frequency syn-chronous CMOS circuit technique in a given process (i.e.,with a given smallest dimension device). We will assumethat the circuit technique rather than the architecturelimits the clock frequency. Our results are therefore appli-cable mainly to simple, pipelineable architectures. In thisarticle, we will present our results by means of bothanalysis and experiments. Section II describes a new clock-ing strategy with its accompanying circuit technique andits importance for high clock frequency. We further inves-tigate the robustness and flexibility of this technique andpropose some further improvements. Section III describesthe effect of logical optimization of the circuits and Sec-tion IV describes the effect of device sizing. In Section V

    Manuscript received May 25, 1988; revised August 10, 1988.The authors are with the Department of Physics and MeasurementTechnology, LSI Design Center, Linkopmg Umversity, 58183 Lmk6ping,Sweden.IEEE Log Number 8824917,

    L. L

    cadimeLuMLu ,Fig. 1. C2MOS logic.

    we present some experiments, illustrating the previousresults. Finally we present our conclusions in Section VI.

    II.A. True

    CLOCKING STRATEGY AND CIRCUIT EXAMPLESSing[e-Phase-Clock Circuit Techniques

    In conventional CMOS circuits both static and dynamicCMOS logic is used. For the purpose of system timing aclocking strategy is always involved except for a self-timedsystem [4]. The most popular clocking strategy is clockedCMOS logic (C2MOS) [5], [6] which uses a nonoverlappingpseudo two-phase clock as shown in Fig. 1. Four clocksignals have to be distributed in such a system and be-tween two pairs of clock signals there should be no over-lap. Clock skews in the system will cause serious problemsand result in difficulties in increasing circuit speed [7].The NORA dynamic CMOS technique [8] uses a true

    two-phase-clock signal @ and ~ instead of a four-phase-clock signal and can avoid race problems caused by clockskews with some constraints on logic composition. In aNORA pipelined system, @-C2MOS latches&d &C2MOSlatches are alternatively used. The most important con-straint is that between two C 2MOS latches there must bean even number of inversion blocks and if there are staticblocks between a precharge block and a C2MOS latch theymust also be of an even number. We choose two typicalNORA constructions which are called 1$ section and ~section and use an N-precharge block in the @section anda P-precharge block in the ~ section for our discussion.These are shown in Fig. 2. Since there is no dead time andno skew problem, it is expected that the NORA dynamic

    0018 -9200/89/0200-0062$01 .00 01989 IEEE

  • 8/9/2019 x High Speed

    2/9

    YUAN AND S VE NS SON : HIG H- SPE ED CMO S CIRCUIT TE CHNIQ UE

    + + + +63

    +

    0- secu on T - SectIon

    $~Fig. 2. NORA dynamic CMOS technique

    Doubled N-C2MOS D ou b le d P- C 2MO S

    Fig. 3 True single-phase-clock latch stages

    CMOS technique can reach higher clock rates than theC2MOS technique.A further development in clock strategy should use evenless clock signals. The true single-phase-clock dynamicCMOS circuit technique [9] uses only one clock signalwhich is never inverted. Therefore, no clock skew existsexcept for clock delay problems and even a higher clockfrequency can be reached. As will be explained below, thetrue single-phase-clock CMOS technique fits not only dy-namic but also static CMOS circuits and in most cases canreplace the NORA CMOS technique.Let us discuss only the latch stages first. In Fig. 2, we

    have seen two C 2MOS latch stages controlled by two clocksignals @ and ~. The necessity of ~ lies in controllingtransistors Mz and M,. However, this can be done byclock signal + at inverters connected to the two stages asshown in Fig. 3. We call the two different units N-C2MOSstage and P-C 2MOS stage, respectively, and if we usedoubled N-C2MOS or P-C2MOS in series they becometrue single-phase-clock latch stages, i.e., N-latch andP-latch. In this system, an N-section (N-latch plus logicblocks) and a P-section (P-latch plus logic blocks) are usedalternatively using the same clock signal. Both static anddynamic blocks are accepted and an N- or P-prechargeblock is used in the N- or P-section, respectively. As longas the clock delay is less than the gate delay the system isreliable. Instead of distributing the ~ clock signal there aretwo transistors more in each latch stage, but the mostimportant thing is that there will be no even inversionconstraint either between two latches or between the latchand the dynamic block. Apparently, this is better than thenonoverlapping pseudo two-phase-clock strategy and also,from the point of view of logic constraints, can competewith the NORA two-phase-clock strategy. The logic func-tion blocks can be included in the N-C2MOS or P-C2MOSlatch stages or placed between them as shown in Fig. 4,depending on the logic type and the requirement of inver-sion.

    @mEi.Fig. 4. Logic arrangements using true single-phase-clock latch stages.

    . . . .

    P-latch N-latchFig. 5. Split-output latch stages.

    ++ + +u1--o$4 $4+ 0+ -4 +4$+ ++N-block P-block

    Fig 6, TSPC-1 circuit,+

    @

    4.4++ $+

    $+ A-1N-block&H-4 44$4P-block

    Fig. 7. TSPC-2 cmcmt.

    Furthermore, the true single-phase-clock latch stagesshown in Fig. 3 can evolve into a simpler version, calledsplit-output latch, as shown in Fig. 5 where only one of thetransistors is controlled by the clock. It implies half theclock load. A possible drawback of the split-output tech-nique is that all node voltages do not have a full voltageswing, as some single transistors are used for the transmis-sion of both high and low signals.In the case of using precharge dynamic logic, let us go

    back to the NORA circuits in Fig. 2. The P-tran~stor M2in the @ section and the N-transistor MT in the + sectionare unnecessary because precharge signals will play thesame role as ~ in both sections, so they can be omitted. InFig. 6, this evolved into the true single-phase-clock dy-namic CMOS technique as described in [9]; we call it truesingle-phase-clock 1 (TSPC-1). We introduce a furthermodified circuit as shown in Fig. 7. We call it TSPC-2,which is better than TSPC-1 in performance, as describedbelow. In the following description, we choose the @section of NORA and N-blocks of TSPC-1 and TSPC-2dynamic circuits as examples.Compared with the nonoverlapping pseudo-two-phase

    clock (NPTC) and the NORA two-phase clock, besides thecompact simple clock distribution of TSPC which willnaturally lead to a higher clocking speed, we can summa-rize other features of them as follows.

  • 8/9/2019 x High Speed

    3/9

    IE EE J O URNAL O F S OLID -S TATE C IR CUITS . VO L. 24. NO . 1 , F EBRUARY 19894

    I TS PC - 1

    m

    NORA &TSPC-2 TSPC-INOR.4

    Fig. 8. Step responses of NORA, TSPC-1, and TSPC-2

    B. Step ResponseBecause the transistor number is reduced from four in

    a C 2MOS stage to three in either an N-C2MOS or aP-C MOS stage, the delay of the latch stage is reduced.Fig. 8 shows the step responses for the three circuitssimulated by SPICE with unit transistor sizes. For the sakeof realism, we have put a unit inverter load into all threecircuits. In this paper, all SPICE simulations are done byusing typical level 2 SPICE parameters for a standard3-pm double-metal CMOS process [10] and using a powersupply voltage of 5 V.First, we find that omitting the P-transistor in the latch

    stage is significant because the most critical slope in thecircuit is the rise slope. This gives TSPC-1 speed improve-ment of a factor of 1.8, Second, by omitting the P-tran-sistor the output of the latch stage may continue to riseduring the initial part of the succeeding precharge phase,thus allowing a shorter evaluation time. Third, we shouldexplain why TSPC-2 has even better performance. Inprecharge logic circuits, at the start of evaluation the latchstage tends to output the precharge state first, i.e., to makethe output low because of the high precharge node, as canbe seen in Fig. 8 at the beginning of the rise slope. This is acommon problem for precharge circuits. It is found thatthe circuit TSPC-2 can solve this problem partly. In Fig. 7,if the logic part is conducting, node A will also be chargedto high during the precharge phase, which prevents theoutput from going low. This makes the rise time shorter.On the other hand, the fall time will increase but this onlymakes both slopes more balanced in time. Another advan-tage is that when the output should be kept high, thecircuit has much smaller dips in the output as shown inFig. 9, which is obtained from SPICE simulations. For theP-block of TSPC-2 circuit the top P-transistor should bewidened by a factor of 1.52 in order to obtain all theadvantages mentioned.

    C. Sensitivity to the Clock SlopeNo maximum slope limit exists in NPTC circuits as long

    as the clocks are nonoverlapping. This is not true inNORA and TSPC circuits. SPICE simulation shows thatNORA and TSPC-1 can accept slopes up to 20 ns for riseand fall edges without disorder. This figure agrees wellwith experimental results on TSPC-1 in [11]. This is about20 times larger than the normal gate delay. Because we areinterested in the possibility of working at high frequencies,

    0Ez3!iS-*gg!13a_TSPC-2

    \o 4 8 12 1 6 (n s)Fig. 9 Output dips of different circuits.

    Dm N-b lock + P-block N-block + P-blockCk>ck T DC r DC 1 DC T +c1 c1 C3 cd

    Fig. 10, problem caused by clock delay. Dg= ga te delay: D< = clockdelay; E = evaluat ion: L = latch.

    a slope of 20 ns is quite acceptable. For TSPC-2, theacceptable clock slope reaches 2 ms, which is of the sameorder of magnitude as the clock period to be accepted bydynamic CMOS circuits. It means that TSPC-2 is morereliable in latching a signal

    D. Clock-Skew ProblemsIn general terms. no problem caused by clock skew

    exists in NORA and TSPC circuits so they are better thanthe NPTC strategy. However, clock skews will compresssynchronous margins in NORA circuits, and for bothNORA and TSPC circuits the clock delay could be aproblem, if it is larger than the gate delay. The problem iscaused by data transparency from block 1 to block 2 whenboth are in the evaluation phase as shown in Fig. 10.Locally, this is not a problem because the clock delay is

    usually less than the gate delay. In a pure pipeline struc-ture the condition between the clock delay and the datadelay is still satisfactory. In the case of a large system, wepropose two ways to solve the problem.1. Reverse Clock Distribution: It is found that if the

    clock is distributed in the opposite direction to the datastream the system will be safe. In this case, the evaluationphase of the next block will be completely included in thedata stable zone of the last block because the reverse clockdistribution creates a safety margin as shown in Fig. 11.This is true for the data stream both from P-block toN-block and from N-block to P-block.2. D-Latch Type Structure: This can be used in the case

    of data feedback, e.g., the communication between two

  • 8/9/2019 x High Speed

    4/9

    YUAIN AND S VE NS SO N : HIGH-S PE ED CMOS CIRCUIT TE CHNIQ UE 65

    Data streamt

    Da@ N-block P-b lock . N-b lock P-block4 I Dc I Dc I I)c Ic1 ClockC2 C3 cd

    Clock d i s tr i bu t i on

    Fig. 11. Reverse clock distribution

    r----+hNc2M0

    Il . t * Lat ch points tl [2Fig. 12. D-1atchtype structure for nonlocal data.

    blocks in a large system, where the reverse clock distribu-tion is no solution as there are both forward and backwarddata streams. Therefore, we propose using a D-1atch struc-ture instead of the normal alternating P-block and N-blockstructure for nonlocal data as shown in Fig. 12 (see also[12]). First, if we look at Fig. 10, the problem is caused bythe overlap between different evaluation phases of twocommunicating blocks. If the data for the next block areonly latched by the start transition of the evaluation phaseof this block at points tl and t2 in Fig. 12, i.e., thecommunication between two blocks exists only duringeach start transition, the system will be much more reli-able. This can be done simply by placing a P-C 2MOSstage before an N-block or an N-C 2MOs stage before aP-block for each nonlocal data. The D-latch structureallows communication between two blocks which have aclock skew caused by clock delay, up to almost half aclock cycle. This has been proved by SPICE simulation.

    E. Circuit ExamplesBecause both static and dynamic circuits can be used,

    including the domino technique, the TSPC strategy has alogic flexibility as high as the NORA technique has. Inmost cases, NORA circuits can be replaced by TSPCcircuits with little modification. One exception is that anN-precharge stage cannot be directly connected to a P-pre-charge stage without latch stages in the TSPC circuit,which is possible in the NORA technique. However, noeven inversion requirement exists in TSPC circuits as inNORA circuits. Because of the compact clock distribution

    ..+ +++ l++ +1

    -(a) (b) (c)

    Fig. 13. Cmcuits constructed by P, N-blocks and P, N-C2MOS stages:(a) positive transition latch: (b) negative transition latch: and (c)divide-by-two circuit,

    Fig. 14 Cu-cuitsconstructed by split-output latch stages.

    and the same circuit complexity, TSPC is preferred. Wepresent several circuits below as examples of the TSPCstrategy.If we cascade the P-type and the N-type latches shown

    in Figs. 3, 5, 6; or 7, they become full dynamic transitionD latches, each of which includes 12 transistors. However,the D latch shown in Fig. 13(a), which consists of ninetransistors, is more effective. This is constructed by aP-C2MOS stage, an N-precharge stage and an N-C2MOSstage, and the input data will be latched by the positivetransition of the clock signal. If we want a noninversionoutput an extra inverter can be placed at the output, whichgives the circuit driving ability. When a negative transitionlatch is needed the circuit can be changed to Fig. 13(b). Asexpected, if the inversion output is connected to the inputof the circuit, a divide-by-two circuit is formed as shown inFig. 13(c).If we use the split-output latch stages shown in Fig. 5 we

    can construct a D latch and a divide-by-two circuit in aneven more efficient way. The resulting circuits have almostthe same speed but only half the clock load and are shownin Fig. 14. The latch stages can also be used for buildingquite effective shift-register chains with both less transis-tors and less clock load than other techniques, and withoutoutput glitches.Note that in the D latches of Figs. 13(a) and 14(a) the

    P-latches are replaced by P-C2MOS stages with only threetransistors. When we need fast P-passing stages in apipelined structure the three-transistor latch is quite effec-tive. However, it require: an input transition from low tohigh with a delay more than the evaluation delay of thenext N-block, otherwise the active transistor in the N-blockmay be cut off too early. This is shown in Fig. 15 wherethe delay of the input signal is changed from 0.8 to 0.5 nsand the output swing is reduced to half the VJ~. Neverthe-less, as long as these D latches are cascaded the delay ofthe last latch is satisfactory for the requirement of the nextlatch and the same for the divide-by-two circuits. Simulat-ion shows that a register chain starting with an N-block(the N-block reduces the time constraints) can work at aclock rate of more than 350 MHz using a unit inverter loadwithout critical input time requirement.

  • 8/9/2019 x High Speed

    5/9

    66 IE EE J O URNAL OF s OLID -S TATE C IRC UITS , VO L. 24, NO. 1 , F EBRUARY 1989

    L AA(a; (:)Fig. 16. Replaceabi lity between NORA and TSPC: (a) NORA serialfull adder, and (b) TSPC serial full adder.

    Another circuit is a serial full adder, which is an exam-ple showing the logic replaceability between NORA andTSPC. Fig. 16(a) is the adder using the NORA techniquepresented in [8]. We can use N-C2MOS instead of C2MOSafter an N-precharge stage and a negative transition latchinstead of the two C 2MOS stages in the NORA circuit forthe delay of the carry signal. The P-precharge stage isreplaced by a static block as the ~ clock is lacking inTSPC, and the static block will evaluate inputs earlier inthe previous phase. The resulting circuit is shown in Fig.16(b), which has the same transistor number but higherspeed than NORA and only a single clock. A correspond-ing circuit with an N-precharge stage instead of the staticblock, needing more transistors as it will use invertedsignals, is presented in [9].

    III. LOGIC STYLEThe maximum clock frequency in a clocked system is

    limited by the delay from one latching instant to the next.This delay depends on the complexity and function of eachlogic block between these two instants. One way to mini-mize this delay is to minimize the complexity of each logicblock, for example, by decomposing complex blocks intopipelined parts. Such decomposition can also be donebetween half clock cycles when using the circuit styledescribed above. In such a way it may be possible toreduce the logic depth to one in each block. Although thiswill cause an initial delay, it is acceptable in a pipelinedstructure. Note that in CMOS technology inverter delay isvery small so that extra inverters can be accepted in theblocks.

    The CMOS circuit technique is particularly sensitive tothe number of transistors in series [6]. It is quite obvious

    Tttt- ICARRY

    + +

    K - : .

    $ 4+13 13 0 4 +2 5 192525 $ 4[ 251010

    $+ 13 Starr$+13 0+ ,3 1+

    Fig. 17. Serial full adder with logic separated in different half clockcycles (TSPC adder 1).

    that we should try to minimize the number of transistors inseries in the logic blocks. This normally means that we alsoneed to minimize the number of inputs to each block. Wewill not attempt to develop a systematic optimizationprocedure using the above principles. Instead we willdemonstrate them by optimizing the serial full adder ofFig. 16 and make a speed comparison by using SPICEsimulation. The serial full adder with a feedback path isconsidered a good example for studying logic style. Forpossible systematic approaches, see [13].

    In Fig. 16(a) and (b), the Boolean functions of CARRYand SUM are

    CARRY =AB+C(A+B)SUM= CARRY(A+B+C)+ABC

    =( AB+C(A+B))(A + B + C) + A BC.This means that the evaluation of CARRY must be donebefore the evaluation of SUM and both have to be finishedin half a clock cycle. We can, of course, divide the SUMevaluation into two parts and put them into different halfclock cycles. Even if we do so, too many steps are stillinvolved in the SUM evaluation and at least three transis-tors are in series. Let us instead change the above Booleanfunction to

    E= A~+ACSUM = E~ + ~B

    CARRY = EB + ~C.Apparently, these Boolean functions are much easier to

    realize and the intermediate result E can be evaluatedduring the previous half clock cycle. There will be only twotransistors in series in the logic part. The circuit accordingto the above Boolean functions and using TSPC strategyhas been presented in [9]. For convenience, we give it inFig. 17. The small figures nearby transistors in Fig. 17 aswell as in Figs. 18 and 19 are the scaled sizes in microme-ters which will be discussed in the next section. Thetransistors without figures have the minimum width, 7 pm.

  • 8/9/2019 x High Speed

    6/9

    YUAN AND S VE NS SON: HIG H-S PE ED CMOS CIRCUIT TE CHNIQ UE

    A? + ++ + +

    67

    TABLE II 0418 fi

  • 8/9/2019 x High Speed

    7/9

    68

    Input

    (a)

    IEEE J OURNAL OF SOLID-STATE CIRCUITS, VOL. 2 4, NO. 1 , FEBRUARY 1 98 9

    P!2M4(nsj 25 75 10[:)

    Fig. 20. Scaled eight-transistor divide-by-two circuit: (a) scaled circtut,and (b) input and output a t 400MHz.r I

    InpL!t

    0(.?) 2 4 6810(a) (b)

    Fig. 21. Scaled nine-transistor divide-by-two circuit: (a) scaled circuit,and (b) input and output at 500 MHz.

    speed. Often, the speed versus device size function doesnot show an optimum but is monotonic. In such cases cost(as used area) also must be taken into account. For simplecases it is possible to obtain analytical results. It is thuswell known that an inverter chain driving a large load canbe optimized by device sizing [15], so that each inverter hasdevices which are about three times larger than the previ-ous inverter. It is also known that an optimal transistorchain connected to ground or supply should be tapered[17]. For more complex circuits it is not possible to obtainanalytical results. The effect of the width of a certaintransistor depends on the position of the transistor in thenetwork and on the sizes of all other transistors in thenetwork. Its width may affect different delays both throughits effect on driving force and self-loading and through itseffect on loading of the previous stages. Thus, in somecases we may have critical loops (as in the case of the carryin our serial adder), which means that the loop delay mustbe analyzed rather than a simple logic delay. In these casesimprovements in speed can be obtained by optimizationby hand, by trying different device sizes in a circuit ortiming simulator (like SPICE or TMODS [21]) [20]. Ascaled version of the adder in Fig. 17 is obtained in thisway, which is described in [20] and indicated in Fig. 17.Recently, several computer tools have been developed

    for automatic optimization through device sizing [16],[18] - [20]. We have used SLOP [20] for this purpose. SLOPis based on a switch-level simulator, TMODS [21], anduses normal switch-level simulation for delay calculation.It can therefore easily handle any kind of CMOS circuit,including circuits with critical loops. The delay calculationalgorithm in TMODS is based on the Elmore delaywithout side branches [22] and is similar to the algorithmin CRYSTAL [23].With the help of the tool SLOP and the confirmation by

    SPICE simulation, after device sizing, the speed of thecircuits described in earlier sections has been increased.Figs. 20 and 21 show the scaled versions of the divided-by-two circuits in Figs. 14(b) and 13(c) and the output wave-forms with input signals with frequencies of 400 and 500

    B

    A

    1 I0 (ns) 25 5 7.5 10 12.5 15Fig, 22, Inputs and outputs of the sca led fast adder (Fig 19) at a c lockrate of 400 MHz,

    TABLE IICOMPAR IS ON OF S CALING

  • 8/9/2019 x High Speed

    8/9

    YUAN AND S VE NS SO N : HIG H-S PE ED CMO S CIRCUIT TE CHNIQ UE

    Fig. 23. Layout photograph of two ripple counters.

    Fig. 24. Photograph of input and output waveforms,

    first circuit is a 4-bit register which consists of four Dlatches (Fig. 13(a) plus an inverter) and the input data aretaken from a divider. The chip area is 0.07 x0.4 mm andthe measured maximum clock rate is more than 230 MHz(limited by the instrumentation). The second circuit is a4-bit ripple counter which is constructed by successivelyconnecting four divide-by-two circuits as already shown inFig. 13(c). The input frequency will be divided by 16 so wecan measure the output at a lower frequency because it isfound that the ordinary output stage and bonding pad arenot fit for so high frequencies. Fig. 23 shows a photographof the layouts of two such ripple counters: one is theunscaled version, chip area 0.28 x0.14 mm, and the otheris the scaled version, chip area 0.36X0.18 mm.Since they are directly cascaded, the load for each circuit

    is more than a unit inverter and the resulting maximumworking frequency is then 166 and 250 MHz for these twoversions according to SPICE simulations. The measuredresults are 160 MHz for the unscaled version and morethan 230 MHz (limited by the instrumentation) for thescaled version, which are the average results of ten samplesfrom each. Fig. 24, a photograph taken from a samplingoscilloscope, shows the scaled divider successfully workingat an input frequency of 230 MHz.The third circuit is the serial full adder shown in Fig. 17.

    The chip area of the unscaled one is 0.22 x0.16 mm and ofthe scaled one is 0.22X 0.24 mm. The measured maximum

    69

    clock rates are 140 and 225 MHz, respectively, which againare the averages from ten samples each.

    VI. CONCLUSIONS1) The true single-phase-clocking strategy has the advan-

    tages of simple and compact clock distribution, high speed,and logic design flexibility. In general terms, no clock-skewproblem exists in this clocking strategy. The skew causedby clock delay between different logic blocks in a largesystem can be minimized by reverse clock distribution andthe D-1atch structure.2) Since the maximum clock frequency in a clocked

    system is limited by the delay from one latching instant tothe next, logic depth minimization in the system will leadto high speed with only a limited increase of the number oflatch stages and initial delay.3) Device sizing will give a significant increase of speed

    at the cost of increased area. The experimental results fromtwo examples show that the practical ratios between speedand area increases are 1.43/1 .65 for a 4-bit ripple counterand 1.60/1 .50 for a serial full adder.4) Both analysis and experiment have demonstrated thatclock rates in excess of 200 MHz are feasible in a 3-pmCMOS process by the combination of the above tech-niques. Some of the circuits possibly reach even the400500-MHz range as shown by SPICE simulations.

    WFERENCES[1][2][3]

    [4]

    [5]

    [6][7][8]

    [9]

    [10][11][12][13][14]

    [15][16]

    C. Svensson, VLSI physics: Inte rat ion, vol. 1, pp. 3-19 , 1983.R. W. Keyes, Physical limlts in rilgita l electr on ics~ Pro.. lIZ~.E,VO1. 63 , P. 740 , 1975.H. Masu da , M. Naka i, and M, Kubo, Characteristics and limita-tion of scaled-down MOSFETS due to two-dimensional fieldeffect, IEEE Trans. Electron Devices, vol. ED-26, pp. 980-986,1979Z.ieitz, System timing, in Introduction to VLSI Systems, C,Mead and L. Conway, Eds. Reading, MA: Addison-Wesley, 1980,ch. 7.Y. Suzuki, K. Odagawa, and T. Abe, Clocked CMOS calcula torcircuitry, IEEE J. So[id-State Circuits, vol. SC-8, pp. 462-469,1977N,Weste and K. Eshraghian, Principles of CMOS VLSI Design.Reading, MA: Addison-Wesley, 1985, ch. 5.M. Shoji, Electrical design of BELLMAC-32A microprocessor, inProc. IEEE Int. Conf. Circuits Comput., 1982, pp. 112-115.N. Goncatves and H. J. De Man, NORA: A racefree dynamicCMOS technique for pipelined logic structures, IEEE J. Solid-State Circuits, vol. SC-18, pp. 261-266, 1983.Y. Ji-ren, 1, Karlsson, and C. Svensson, A true single phase clockdynamic CMOS circuit technique, IEEE J. Solid-State Circuits,vol. SC-22, pp. 899901, 1987 .Layout design rules for 3.0 #m P-well CMOS; VTI TechnologyInc., San Jose, CA.I. Karlsson.

  • 8/9/2019 x High Speed

    9/9

    70[17][18][19]

    [20]

    [21][22]

    [23]

    M. Shoii. F ET sca limz in dom in o CMOS ga tes . l,F,EE ~, So/ id -State C&uits, vol. SC-~0, pp . 1067-1071, 1!%5.K. S, Hedlu n d , Aesop: A tool for au tomated tran s is tor s izin g, inProc. 24th A CM/IEEE Design Automation Conf., 1987, paper 7 .1 ,p p. 1 14 1 20 .M, A. Cir it , Tra n s is tor s izin g in CMOS circu its , in Proc. 24thA CM/lEEE Design Automation Conf., 1987, paper 7 .2 , pp .121-124 .J . Yu an and C. Svensson,