ejsr_42_1_05

Embed Size (px)

Citation preview

  • 8/2/2019 ejsr_42_1_05

    1/6

    European Journal of Scientific Research

    ISSN 1450-216X Vol.42 No.1 (2010), pp.53-58 EuroJournals Publishing, Inc. 2010

    http://www.eurojournals.com/ejsr.htm

    ASIC Implementation of Modified Faster Carry Save Adder

    B.RamkumarVLSI division, VIT university, Vellore, Tamilnadu, India

    E-mail: [email protected]: +91-9843700849

    Harish M Kittur

    VLSI division, VIT university, Vellore, Tamilnadu, India

    P.Mahesh Kannan

    VLSI division, VIT university, Vellore, Tamilnadu, India

    Abstract

    Digital Adders are the core block of DSP processors. The final carry propagationadder (CPA) structure of many adders constitutes high carry propagation delay and this

    delay reduces the overall performance of the DSP processor. This paper proposes a simple

    and efficient approach to reduce the maximum delay of carry propagation in the final stage.

    Based on this approach a 16, 32 and 64-bit adder architecture has been developed andcompared with conventional fast adder architectures. This work identifies the performance

    of proposed designs in terms of delay-area-power through custom design and layout in

    0.18um CMOS process technology. The result analysis shows that the proposed

    architectures have better performance in reduction of carry propagation delay thancontemporary architectures.

    Keywords: ASIC, faster adder, carry propagation delay, power-delay product.

    1. IntroductionDesign of high speed data path logic systems are one of the most substantial research area in VLSI

    system design. High-speed addition and multiplication has always been a fundamental requirement ofhigh-performance processors and systems. The major speed limitation in any adder is in the production

    of carries and many authors have considered the addition problem [1]-[4].

    The basic idea of the proposed work is using n-bit binary to excess-1 code converters (BEC) toimprove the speed of addition. The detailed structure and function of BEC is discussed in section 2.

    This logic can be implemented with any type of adder to further improve the speed. The proposed 16,

    32 and 64-bit adders are compared in this paper with the conventional fast adders such as carry saveadder (CSA) and carry look ahead adder (CLA). This paper has realized the improved performance of

    the CSA with BEC logic through custom design and layout [5]-[6].

    The final stage CPA constitutes a dominant component of the delay in the parallel multiplier[7]-[8]. Signals from the multiplier partial products summation tree do not arrive at the final CPA at the

    same time. This is due to the fact that the number of partial-product bits is larger in the middle of the

    multiplier tree. Due to un-even arrival time of the input signals to the final CPA, the selection of the

  • 8/2/2019 ejsr_42_1_05

    2/6

    ASIC Implementation of Modified Faster Carry Save Adder 54

    final adder is an important work in parallel multipliers [9]. Therefore decrease in carry propagation

    delay will result in major enhancement of the speed of the adder and multiplier [10].

    This paper is structured as follows. In Section 2, an overview of the 4-bit binary to excess-1logic is provided. Section 3 deals with the proposed modified carry save adder (MCSA) architecture.

    The ASIC implementation details of the various adders are presented in Section 4.

    2. BECA structure of 4-bit BEC and the truth table is shown in Figure 1 and Table 1 respectively. How thegoal of fast addition is achieved using BEC together with a multiplexer (mux) is described in Figure 2,

    one input of the 8:4 mux gets as it input (B3, B2, B1, and B0) and another input of the mux is the BEC

    output. This produces the two possible partial product results in parallel and the muxes are used to

    select either BEC output or the direct inputs according to the control signal Cin. The Booleanexpressions of 4-bit BEC are listed below, (Note: functional symbols, ~ NOT, & AND, ^ XOR).

    X0 = ~ B0 (1)

    X1 = B0 ^ B1 (2)X2 = B2 ^ (B0 & B1) (3)

    X3 = B3 ^ (B0 & B1 & B2) (4)

    Table 1: Truth table of 4-bit binary to excess-1

    Binary Excess-1

    0000 00010001 00100010 00110011 01000100 01010101 01100110 0111

    0111 10001000 10011001 10101010 10111011 11001100 11011101 11101110 11111111 0000

  • 8/2/2019 ejsr_42_1_05

    3/6

    55 B.Ramkumar, Harish M Kittur and P.Mahesh Kannan

    Figure 1: 4-bit Binary to Excess-1 code converter

    B3B2B1B0 B2B1 B0 B1 B0 B0

    X3 X2 X1 X0

    B3 B2 B1 B0

    X3 X2 X1 X0

    Figure 2: 4-bit Binary to Excess-1 logic with 8:4 multiplexer

    4

    4

    8:4 mux

    4-Bit Binary

    to Excess-1

    4

    S3 S2 S1 S0

    B3 B2 B1 B0

    C in

    1 0

    4

    3. MCSAThe 16-bit conventional CSA is shown in Figure 3. It has 17-half adders and 15-full adders. Since the

    ripple carry adder (RCA) is used in the final stage, this structure yields large carry propagation delay.

    To reduce this delay, the final stage of CSA is divided into 5 groups as shown in Figure 4. The first

    group includes n2log1+ -bit value and other groups includes n2log -bit value, where n is the bit size of

    the adder. The divided groups are listed below,i). {c4,s[4:0]}

    ii). {c7,x[7:5]}iii). {c10,x[10:8]}iv). {c13,x[13:11]}v). x[17:14]

    The first group of output s[4:0] are directly assigned as the final output; the second group

    {c7,x[7:5]} manipulates the partial result by considering c4 is 0; the third group {c10,x[10:8]}

    manipulates the partial result by considering c7 is 0; the fourth group {c13,x[13:11]} manipulates the

  • 8/2/2019 ejsr_42_1_05

    4/6

    ASIC Implementation of Modified Faster Carry Save Adder 56

    H - Half Adder

    F - Full Adder

    partial result by considering c10 is 0 and the fifth group x[17:14] manipulates the partial result by

    considering c13 is 0.

    Depending on c4 of the first group, the second group mux gives the final result without the

    carry propagation delay from c4 to c7; depending on c7 of the second group final result, the third group

    mux gives the final result without the carry propagation delay from c7 to c10; depending on c10 of the

    third group final result, the fourth group mux gives the final result without the carry propagation delay

    from c10 to c13 and depending on c13 of the fourth group final result, the fifth group mux gives the

    final result without the carry propagation delay from c13 to s17.The main advantage of this logic is that each group computes the partial results in parallel and

    the muxes are ready to give the final result immediately with the minimum delay of the mux. When

    the Cin of each group arrives, the final result will be determined immediately. Thus the maximum

    delay is reduced in the carry propagation path. This same logic has been used for 32 and 64-bit adder

    structures to achieve higher speeds.

    Table 2 exhibits the post layout simulation results of adder structures in terms of delay, area

    and power. The area indicates the total cell area of the design and the total power is sum of leakage

    power, internal power, net power and dynamic power. The proposed result shows that the CLA and

    CSA have reduced area and consume lesser power than MCSA. But the speed of the MCSA

    architecture has significantly improved and has the least value of power-delay product compared to the

    conventional CSA and CLA.

    Figure 3: 16-bit CSA

    Cin

    a ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba b

    s17 s16 s15 s14 s13 s12 s11 s10 s9 s8 s7 s6 s5 s4 s3 s2 s1 s0

    H H H H H H H H H H H H H H H F

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    H F F F F F F F F F F F F F F Hc15 c14 c13 c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1

    Figure 4: 16-bit MCSA

    c4c13 c10 c7

    c15 c14 c12 c11 c9 c8 c6 c5 c3 c2 c1

    Cin

    x17 x16 x15 x14 x13 x12 x11 x10 x9 x8 x7 x6 x5 s4 s3 s2 s1 s0

    H H H H H H H H H H H H H H H F

    15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

    H F F F F F F F F F F F F F F H0000

    a ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba b

  • 8/2/2019 ejsr_42_1_05

    5/6

    57 B.Ramkumar, Harish M Kittur and P.Mahesh Kannan

    1 0

    c10 s10 s9 s8c13 s13 s12 s11

    8: mux

    4-Bit Binary

    to Excess-1

    s17 s16 s15 s14

    x17 x16 x15 x14

    8:4 mux

    4-Bit Binary

    to Excess-1

    c10 x10 x9 x8

    8:4 mux

    4-Bit Binary

    to Excess-1

    c13 x13 x12 x11

    8:4 mux

    4-Bit Binary

    to Excess-1

    c7 s7 s6 s5

    c7 x7 x6 x5

    c4

    1 0

    Table 2: Delay-Area-Power of Adders in TSMC 0.18 micron technology

    Power (uW)

    Word size Adder Delay (ns) Area(um2)

    Leakage

    Power

    Switching

    power

    Total

    power*

    Power Delay

    Product (10-12

    )

    16-bit

    CSA 3.6 1660 0.011 207.8 415.7 1.49

    CLA 3.5 1118 0.008 175.5 351.0 1.22

    MCSA 2.3 2165 0.015 254.8 529.7 1.21

    32-bit

    CSA 6.6 3363 0.023 415.4 830.9 5.48

    CLA 6.5 2235 0.016 347.4 702.9 4.56

    MCSA 3.8 4737 0.032 532.9 1045.9 3.97

    64-bit

    CSA 12.7 6769 0.047 821.3 1642.6 20.86

    CLA 12.6 4471 0.032 691.1 1382.0 17.41

    MCSA 6.9 9883 0.067 1049.6 2099.2 14.48

    *Total power = leakage power + Internal power + Net power + Switching power

    4. ASIC ImplementationThe proposed designs in this paper have been developed using Verilog-HDL and synthesized in

    Cadence RTL compiler using typical libraries of TSMC 0.18um technology. The synthesized Verilognetlist and their respective design constraints file (SDC) are imported to Cadence SoC Encounter and

    are used to generate automated layout from standard cells and placement & routing [11]. Parasitic

    extraction is performed using Encounters Native RC extraction tool. The extracted parasitic RC (SPEF

    format) is back annotated to Common Timing Engine in Encounter Platform and analyzed for static

    timing delay. The power analysis is done using Virtuso Ultrasim [12].

    5. ConclusionA very simple approach is proposed in this paper to improve the speed of addition. CLA is arranged in

    the form of carry select adder (CSLA) and is used to speed up the final addition in many parallel

    multipliers [9]. But due to the structure of the CSLA it occupies more chip area, because it usesmultiple pairs of RCAs (CLA) to generate partial sum and carry by considering Cin=0 and Cin=1.

    Thus the complexity of the final adder structure is high. By replacing, as demonstrated in this paper,

    the RCA (CLA) with Cin=1 with the BEC logic, obviously the maximum area and delay can be

    reduced in the final adder structure.

    Therefore, the BEC logic can be used with any type of adder to enhance the speed of addition.

    Figure 5 shows the power-delay comparison graphs of Table 2. The compared results prove that the

    MCSA are faster and very suitable for VLSI hardware implementation, than the other known

    architectures.

  • 8/2/2019 ejsr_42_1_05

    6/6

    ASIC Implementation of Modified Faster Carry Save Adder 58

    Figure 5: Power- Delay Product comparison of adders. (a) 16-bit. (b)32-bit. (c) 64-bit

    0

    1

    2

    3

    4

    5

    6

    32-bit

    WORD SIZE

    DELAY[ns]

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.2

    1.4

    1.6

    16-bit

    WORD SIZE

    DELAY[ns]

    0

    5

    10

    15

    20

    25

    64-bit

    WORD SIZE

    D

    ELAY[ns]

    CSA, CLA, MCSA

    References[1] V. G. Oklobdzija, High-Speed VLSI Arithmetic Units: Adders and Multipliers, in Design of

    High-Performance Microprocessor Circuits, Book edited by A.Chandrakasan,IEEE Press,2000[2] M. J. Flynn and S. F. Oberman, "Advanced Computer Arithmetic Design", John Wiley & Sons,

    2001.

    [3] J. Sklansky, Conditional-Sum Addition Logic, IRE Transactions on Electronic Computers,EC-9, p 226-231, 1960

    [4] O. J. Bedrij, Carry-Select Adder, IRE Transactions on Electronic Computers, p.340-344,1962

    [5] R. P. Brent and H. T. Kung, A Regular Layout for Parallel Adders, IEEE Transaction oncomputers, Vol. C-31, No.3, p.260-264, March, 1982.

    [6] T. D. Han and D. A. Carlson, Fast Area-Efficient VLSI Adders, 8th symposium on ComputerArithmetic, May 1987

    [7]

    C.S. Wallace A suggestion for a fast multiplier. IEEE Trans.On Computers, vol.13, pp, 14-17, 1964.

    [8] L. Dadda, Some schemes for parallel multipliers, Alta Frequenza, vol.34,pp.349-356,1965[9] Vojin G. Oklobdzija, Improving Multiplier Design by Using Improved Column Compression

    Tree and Optimized Final Adder in CMOS Technology, IEEE transactions on Very Large

    Scale Integration (VLSI) systems, vol. 3, no. 2, June 1995

    [10] Paul F.Stelling, Design strategies for optimal hybrid final adders in parallel multiplier,Journalof VLSI signal processing, vol 14,pp,321-331,1996.

    [11] EncounterTM User Guide, pp. 582-592, February, 2006.[12] Virtuoso Ultrasim User Guide, p.17, June,2004.