A Comparative Study of Adders - Repository Home

The Report Committee for Poulami Das

Certifies that this is the approved version of the following report:

A Comparative Study of Adders

APPROVED BY

SUPERVISING COMMITTEE:

Earl E. Swartzlander

Lizy K. John

Supervisor:


by

Poulami Das, B.Tech

Report

Presented to the Faculty of the Graduate School of

The University of Texas at Austin

in Partial Fulfillment

of the Requirements

for the Degree of

Master of Science in Engineering

The University of Texas at Austin

May 2016

iii

ABSTRACT


Poulami Das, M.S.E.

The University of Texas at Austin, 2016

Supervisor: Earl Swartzlander

This report compares the area, delay, and gate count complexity of 8, 16, 32, and 64

bit versions of several types of adders. For the purpose of the study, ripple carry adders,

carry look-ahead adders, carry select adders, carry skip adders and Kogge Stone adders

were used. To fulfill the study, the adders were implemented in structural Verilog using

only 2-input NAND and NOR gates and inverters. The adders were synthesized using

Design Vision (by Synopsys). Auto Place and Route was performed using Cadence

Encounter to get the layout of the adders and then Parasitic Extraction was performed to

get the actual routing delay. Primetime was used to calculate the post synthesis and post

place and route delays. Post-PNR netlist was used to compare the area, and delay of the

different adders.

iv

TABLE OF CONTENTS

I. Introduction 1

II. Methodology and Description 3

III. Functional Verification 27

IV. Timing Analysis 28

V. Results 32

VI. Conclusion 37

VII. References 38

1

I. INTRODUCTION

The simplest form of combinational adder is the ripple-carry adder. An n-bit

ripple carry adder consists of n 1-bit full adder connected in succession with the

carry “rippling” from the least significant bit to the most significant bit. The

input carry to any stage depends on the carry out from the previous stage,

thereby making the delay of ripple-carry adders in the order of O (N). Thus the

delay increases linearly as the size of the operands increase. This becomes a

performance bottleneck as the size of the operands grows to 64 bits and 128 bits.

Carry select adders consist of two ripple carry adders and a multiplexer.

Addition of two n-bit numbers is done speculatively with two adders: one time

with the assumption that the carry in is zero, and the other assuming that the

carry is one. After the two results are calculated, the correct sum and the correct

carry are selected with the multiplexer once the correct carry of previous stage is

known. The number of bits in each carry select block can be uniform or variable

and may be optimized for performance. In the uniform case, the optimal delay

occurs for a block size of √N, where N is the size of the operands. The carry select

adders are simple but rather fast compared to the ripple carry adders having a

delay of O (√N). Carry Skip adders are built on determining if a block of some

size will propagate a delay to the next block or not.

The higher delay in the case of ripple carry adders is due to the carry chain.

Thus, in the development of different types of adders, efforts have always been

made to break the dependency on the carry chain. In carry look-ahead adders,

the carry signals are calculated in advance, based on the input signals. For any bit

position, a carry will be generated if the corresponding input bits are ‘1’ or if the

carry-in to that bit was a ‘1’ and at least one of the input bits are ‘1’. Based on

this, a recurrence relation is derived which expresses the input carry to any bit

position in terms of the relevant addend and augend bits and a lower-adder

carry. This can result in considerable gain in speed.

2

For wide adders, the delay of the carry-lookahead adders is dominated by the

delay of propagating the carry through the different “lookahead” stages. This

delay can be reduced by looking ahead across the look-ahead blocks. In general,

a multi-level tree of lookahead structures can be generated to achieve delay that

grows with (Log N).

Adders are particularly classified into serial and parallel prefix adders on a

broader range. Kogge-Stone adder is a parallel prefix adder. In simple words, it

uses parallel carry-lookahead addition to achieve even lower delay in the order

of O(Log N). It uses a technique called recursive doubling in an algorithm for

solving a large class of recurrence problems on parallel computers. Recursive

doubling involves the splitting of the computation of a function into two equally

complex sub functions whose evaluation can be performed simultaneously. The

kogge-stone adder has log 2N stages and a fanout of 2 at each stage. This comes

at the cost of many long wires that must be routed between stages.

In this report the delay and complexity of 8 bit, 16 bit, 32 bit and 64 versions

of the following types of adders were compared:

Ripple Carry

Carry Select

Carry Skip

Carry Lookahead

Kogge-Stone

3

I. METHODOLOGY AND DESCRIPTION

Structural Verilog code was written to implement all the different adders.

The designs were verified for functional correctness based on randomized

inputs. Synthesis was completed using Design Vision by Synopsys and static

timing analysis was performed using Primetime. The gate level netlist obtained

after synthesis was used for Auto Place and Route (APR) using Cadence

Encounter. Multiple iterations were performed by varying the different tool

parameters so that the layout obtained had minimum area as well as delay.

Gate Level Design

Ripple Carry Adder

Ripple carry adders are made of full-adders with the carry bit propagating

from the LSB to the MSB. A simple block diagram of a 4-bit ripple carry adder is

shown in Figure 1. A[3:0] and B[3:0] are the two operands and c0 is the input

carry to the overall adder. c1, c2, and c3 are the intermediate carry bits that are

propagated from LSB towards MSB and c4 is the final output carry of the 4 bit

adder. S[3:0] represents the sum of A and B.

Figure 1: Block Diagram of a 4-bit Ripple Carry Adder

4

A full adder module was created and instantiated n times to make n bit

ripple carry adders where n=8, 16, 32, 64 for the scope of this report. The carry

out from the ith stage was connected as an input carry to the (i+1)th stage. A

snapshot of the 8 bit ripple carry adder Verilog code is given in Figure 2 as a

reference. Since there is not much difference in the implementation of the 16 bit,

32 bit, and 64 bit adders, their Verilog implementations are skipped in this

report to eliminate redundancy. The layouts of all the four ripple carry adders

are however presented in Figure 3 to Figure 6 in the order of 8 bit, 16 bit, 32 bit

and 64 bit respectively.

Figure 2: Gate level Verilog implementation of 8-bit Ripple Carry Adder

5

Figure 3: Layout of the 8-bit Ripple Carry Adder



6

Figure 6: Enlarged View of the layout of the 64-bit Ripple Carry Adder

Carry Select Adder

Carry select adder is made up of blocks of ripple carry adders. Two Ripple

carry adders are used, one with input carry of 0 and other with input carry of 1,

to generate the sum and the carry out bits. Depending on the carry out from the

previous stage, the correct sum and carry out bits are chosen using a multiplexer.

The basic structure of a 16 bit carry select adder is shown in Figure 7.

Figure 7: Block Diagram of a 16-bit Carry Select Adder

7

For 16 bit carry select adder, 4 bit ripple carry adders were used as basic

blocks. The delay for carry select adder is minimized if uniform sized ripple carry

adders are used as basic blocks. For this, the size of each ripple carry adder is

√N. Hence, a 4 bit ripple carry adder module was created and multiple copies of

the same were used along with multiplexers to get a 16 bit carry select adder.

Similarly, 8 bit ripple carry adder blocks were used for 64 bit carry select adders.

For 32 bit carry select adder, 5, 5, 5, 5, 5, 7 sized blocks were used. Figure 8

shows a snapshot of the 16 bit carry select adder implementation as a

representation. The other ones are not presented in this report to avoid

redundancy. Figure 9 to 12 shows the post auto place and route layouts of the 8

bit, 16 bit, 32 bit, and 64 bit carry select adders respectively.

Figure 8

8

Figure 8: Structural Verilog implementation of a 16-bit Carry Select Adder (Block size: 4)

9

Figure 9: Layout of the 8-bit Carry Select Adder



10

Figure 12: Enlarged view layout of the 64-bit Carry Select Adder

Carry Skip Adder

Carry skip adders break the dependency of the carry chain by quickly

deciding if adding k bits (block size k) in ripple carry fashion will propagate a

carry or not to the next block. Whether a block will propagate a carry or not is

determined by figuring out the overall group generate and group propagate for

that block and using the input carry to the block, the input carry to the next block

can be generated. A carry signal entering a certain block can be propagated past

the block without waiting for the signal to propagate through the 4 individual

stages of the block. An example 16 bit carry skip adder with block size 4 is given

in Figure 13. A snapshot of the Verilog code for the 16 bit carry skip adder is

shown in Figure 14. Figures 15 to 18 shows the layouts of 8 bit, 16 bit, 32 bit and

64 bit carry skip adders respectively.

11

Figure 13: Block Diagram of a 4-bit Carry Skip Adder

Figure 14

12

Figure 14: Verilog implementation of a 16 bit Carry Skip adder

13

Figure 15: Layout of the 8-bit Carry Skip Adder



14

Figure 18: Enlarged view of the layout of the 64-bit Carry Skip Adder

Carry Lookahead Adder

Figure 19 below shows the structure of a 4-bit carry lookahead adder. It

consists of 4 modified full adders and a carry lookahead logic block. The carry

lookahead block generates the carry signals c1, c2 and c3 for the 4 bit adder and

also the group generate and propagate signal. The carry lookahead block

generates the carry signals and the group propagate and generate signals as

given by the equations in Figure 20. Carry out for the 4 bit adder is given by the

following equation:

Cout=Cin.P0-3 + G0-3

15

The group generate and propagate signals can be used to cascade the 4 bit

adders in order to get a 16 bit adder. This would require having 4 4-bit CLA

adders and an additional carry lookahead block as shown in the Figure 21.

Figure 19: Block Diagram of a 4-bit Carry Lookahead Adder

Figure 20: Carry Lookahead Logic Equations

16

Figure 21: Block Diagram of a 16-bit Carry Lookahead Adder

32 bit and 64 bit carry look-ahead adders can be similarly made by cascading the

different lookahead blocks as done for the 16 bit adder. Figure 22 shows the

structural Verilog implementation of a 32 bit carry lookahead adder. Figure 23

to 26 shows the layouts of 8 bit, 16 bit, 32 bit, and 64 bit carry lookahead adders

respectively.

Figure 22

17

Figure 22

18

Figure 22

19

Figure 22: Verilog implementation of a 32-bit Carry Lookahead Adder

20

Figure 23: Layout of the 8-bit Carry Lookahead Adder

Figure 24: Layout of the 16-bit Carry Lookahead Adder

Figure 25: Layout of a 32-bit Carry Lookahead Adder

21

Figure 26: Enlarged view of the layout of the 64-bit Carry Lookahead Adder

Kogge Stone Adder

The details about the generation of generate, and propagate signals are given while describing carry lookahead adders in the previous section. The idea remains the same and hence not duplicated here. Kogge stone adder is a parallel prefix adder and uses a prefix operator to implement the carry generation operation. The prefix operator used to make the tree structure implements the following equations:

(g, p)•(g’, p’) = (g + p.g’, p.p’)

As an example, the tree structure used for a 16 bit kogge stone adder is given below in Figure 27. It was similarly extended to get the structure for 32 bit and 64 bit adders.

At the end of the tree, the group generate and propagate from bit 0 to bit i, ie. Gi, Pi is obtained.

22

Figure 27. Tree Structure for 16 bit Kogge Stone Adders

The output sum and carry bits are generated using Cin and Gi, Pi with the following equations:

C-1 = Cin Ci = Gi + Pi . Cin Si = pi xor Ci-1 This is referred to as post processing. Verilog implementation of the 64 bit

Kogge Stone adder is given in Figure 28. Figure 29 to 32 shows the layouts of the

8 bit, 16 bit, 32 bit and 64 bit Kogge Stone adders respectively.

Figure 28

23

Figure 28

24

Figure 28: Verilog implementation of the 16 bit Kogge-Stone Adder

25

Figure 29: Layout of the 8 bit Kogge-Stone Adder

Figure 30: Layout of the 16 bit Kogge-Stone Adder

Figure 31: Layout of a 32 bit Kogge-Stone Adder

26

Figure 32: Enlarged view of the layout of the 64 bit Kogge-Stone Adder

27

II. FUNCTIONAL VERIFICATION

Verilog simulations were performed to verify the functional correctness of the different types of adder. The waveforms for all the 8 bit adders are given for the purpose of representation in Figure 33.

Figure 33: Waveform for 8 bit adders

28

III. TIMING ANALYSIS

After functional verification, Synopsys PrimeTime tool was used to

measure the Pre APR as well as Post APR delay of the adders. The unit used is

nanoseconds. As an example, the Pre APR timing report for the 32 bit carry

lookahead adder is given in Figure 34. Figure 35 shows an example post APR

timing report. This report is for the 64 bit Kogge Stone adder. To generate the

post APR timing an input delay of 1 ns was set. This was subtracted from the

final value to get the actual delay.

Information: Updating design information... (UID-85) **************************************** Report : timing -path full -delay max -max_paths 1 Design : carry_lookahead_32bit Version: G-2012.06-SP4 Date : Mon Apr 11 00:09:51 2016 **************************************** Operating Conditions: typical Library: gscl45nm Wire Load Model Mode: top Startpoint: a[0] (input port) Endpoint: sum[31] (output port) Path Group: (none) Path Type: max Point Incr Path ----------------------------------------------------------- input external delay 0.00 0.00 r a[0] (in) 0.00 0.00 r GP0/a (gen_prop_0) 0.00 0.00 r GP0/U2/a (xor_gate2_0) 0.00 0.00 r GP0/U2/U10/Y (NAND2X1) 0.01 0.01 f GP0/U2/U3/Y (INVX1) 0.00 0.01 r GP0/U2/U4/Y (INVX1) 0.01 0.02 f GP0/U2/U7/Y (NAND2X1) 0.02 0.05 r GP0/U2/U5/Y (INVX1) 0.02 0.07 f GP0/U2/U6/Y (INVX1) 0.01 0.08 r GP0/U2/c (xor_gate2_0) 0.00 0.08 r GP0/p (gen_prop_0) 0.00 0.08 r LA0/p[0] (lookahead_logic_0) 0.00 0.08 r LA0/U41/Y (NAND2X1) 0.01 0.09 f LA0/U11/Y (INVX1) 0.00 0.10 r LA0/U12/Y (INVX1) 0.01 0.11 f LA0/U39/Y (NOR2X1) 0.03 0.14 r LA0/U23/Y (INVX1) 0.02 0.16 f LA0/U24/Y (INVX1) 0.01 0.17 r LA0/P (lookahead_logic_0) 0.00 0.17 r LA4/p[0] (lookahead_logic_6) 0.00 0.17 r LA4/U41/Y (NAND2X1) 0.01 0.18 f LA4/U13/Y (INVX1) 0.00 0.18 r LA4/U14/Y (INVX1) 0.01 0.20 f

Figure 34

29

LA4/U39/Y (NOR2X1) 0.03 0.23 r LA4/U11/Y (INVX1) 0.02 0.24 f LA4/U12/Y (INVX1) 0.00 0.24 r LA4/P (lookahead_logic_6) 0.00 0.24 r CG0/p (carry_gen_0) 0.00 0.24 r CG0/U7/Y (NAND2X1) 0.01 0.25 f CG0/U4/Y (INVX1) 0.00 0.26 r CG0/U5/Y (INVX1) 0.01 0.27 f CG0/U6/Y (NAND2X1) 0.01 0.28 r CG0/U2/Y (INVX1) 0.02 0.30 f CG0/U3/Y (INVX1) 0.04 0.34 r CG0/cout (carry_gen_0) 0.00 0.34 r LA9/cin (lookahead_logic_1) 0.00 0.34 r LA9/U31/Y (NAND2X1) 0.02 0.36 f LA9/U30/Y (INVX1) 0.00 0.36 r LA9/U29/Y (NAND2X1) 0.01 0.37 f LA9/U7/Y (INVX1) 0.00 0.37 r LA9/U8/Y (INVX1) 0.01 0.39 f LA9/U28/Y (NAND2X1) 0.01 0.40 r LA9/U5/Y (INVX1) 0.02 0.42 f LA9/U6/Y (INVX1) 0.00 0.42 r LA9/U27/Y (NAND2X1) 0.01 0.43 f LA9/U3/Y (INVX1) 0.00 0.43 r LA9/U4/Y (INVX1) 0.01 0.44 f LA9/U26/Y (NAND2X1) 0.01 0.45 r LA9/U1/Y (INVX1) 0.02 0.48 f LA9/U2/Y (INVX1) 0.00 0.47 r LA9/U25/Y (NAND2X1) 0.01 0.49 f LA9/U19/Y (INVX1) 0.00 0.49 r LA9/U20/Y (INVX1) 0.02 0.51 f LA9/cout[2] (lookahead_logic_1) 0.00 0.51 f LA8/cin (lookahead_logic_2) 0.00 0.51 f LA8/U31/Y (NAND2X1) 0.02 0.53 r LA8/U30/Y (INVX1) 0.02 0.55 f LA8/U29/Y (NAND2X1) 0.02 0.58 r LA8/U7/Y (INVX1) 0.02 0.60 f LA8/U8/Y (INVX1) 0.00 0.60 r LA8/U28/Y (NAND2X1) 0.01 0.61 f LA8/U5/Y (INVX1) 0.00 0.61 r LA8/U6/Y (INVX1) 0.01 0.62 f LA8/U27/Y (NAND2X1) 0.01 0.63 r LA8/U3/Y (INVX1) 0.02 0.65 f LA8/U4/Y (INVX1) 0.00 0.65 r LA8/U26/Y (NAND2X1) 0.01 0.67 f LA8/U1/Y (INVX1) 0.00 0.67 r LA8/U2/Y (INVX1) 0.01 0.68 f LA8/U25/Y (NAND2X1) 0.01 0.69 r LA8/U17/Y (INVX1) 0.02 0.71 f LA8/U18/Y (INVX1) 0.00 0.72 r LA8/cout[2] (lookahead_logic_2) 0.00 0.72 r FA32/cin (full_adder_mod_1) 0.00 0.72 r FA32/U2/b (xor_gate2_1) 0.00 0.72 r FA32/U2/U9/Y (INVX1) 0.01 0.73 f FA32/U2/U8/Y (NAND2X1) 0.01 0.74 r FA32/U2/U3/Y (INVX1) 0.02 0.76 f FA32/U2/U4/Y (INVX1) 0.00 0.76 r FA32/U2/U5/Y (NAND2X1) 0.01 0.77 f FA32/U2/c (xor_gate2_1) 0.00 0.77 f FA32/sum (full_adder_mod_1) 0.00 0.77 f sum[31] (out) 0.00 0.77 f data arrival time 0.77 ----------------------------------------------------------- (Path is unconstrained)

Figure 34: Pre APR Timing report for 32 bit Carry Lookahead Adder

30

**************************************** Report : timing -path_type full -delay_type max -max_paths 1 Design : kogge_stone_adder_64bit Version: H-2012.12 Date : Tue Apr 12 14:00:51 2016 **************************************** Startpoint: b[0] (input port clocked by vclk) Endpoint: sum[63] (output port) Path Group: (none) Path Type: max Point Incr Path --------------------------------------------------------------- input external delay 1.00 1.00 f b[0] (in) 0.00 & 1.00 f GP0/b (gen_prop_0) 0.00 & 1.00 f GP0/U2/b (xor_gate2_64) 0.00 & 1.00 f GP0/U2/U11/Y (INVX1) -0.00 & 1.00 r GP0/U2/U10/Y (NAND2X1) 0.01 & 1.01 f GP0/U2/U3/Y (INVX1) 0.01 & 1.02 r GP0/U2/U4/Y (INVX1) 0.01 & 1.03 f GP0/U2/U7/Y (NAND2X1) 0.03 & 1.06 r GP0/U2/U5/Y (INVX1) 0.02 & 1.08 f GP0/U2/U6/Y (INVX1) 0.03 & 1.11 r GP0/U2/c (xor_gate2_64) 0.00 & 1.11 r GP0/p (gen_prop_0) 0.00 & 1.11 r CG0/p (carry_gen_0) 0.00 & 1.11 r CG0/U7/Y (NAND2X1) 0.01 & 1.12 f CG0/U4/Y (INVX1) 0.00 & 1.13 r CG0/U5/Y (INVX1) 0.01 & 1.14 f CG0/U6/Y (NAND2X1) 0.01 & 1.15 r CG0/U2/Y (INVX1) 0.02 & 1.17 f CG0/U3/Y (INVX1) 0.06 & 1.24 r CG0/cout (carry_gen_0) 0.00 & 1.24 r CG3/c (carry_gen_62) 0.00 & 1.24 r CG3/U7/Y (NAND2X1) 0.03 & 1.27 f CG3/U4/Y (INVX1) 0.00 & 1.27 r CG3/U5/Y (INVX1) 0.01 & 1.28 f CG3/U6/Y (NAND2X1) 0.01 & 1.30 r CG3/U2/Y (INVX1) 0.02 & 1.32 f CG3/U3/Y (INVX1) 0.06 & 1.38 r CG3/cout (carry_gen_62) 0.00 & 1.38 r CG7/c (carry_gen_58) 0.00 & 1.38 r CG7/U7/Y (NAND2X1) 0.03 & 1.41 f CG7/U2/Y (INVX1) 0.00 & 1.42 r CG7/U3/Y (INVX1) 0.02 & 1.43 f CG7/U6/Y (NAND2X1) 0.02 & 1.45 r CG7/U4/Y (INVX1) 0.02 & 1.47 f CG7/U5/Y (INVX1) 0.09 & 1.56 r CG7/cout (carry_gen_58) 0.00 & 1.56 r CG15/c (carry_gen_50) 0.00 & 1.56 r CG15/U7/Y (NAND2X1) 0.04 & 1.61 f CG15/U2/Y (INVX1) -0.00 & 1.60 r CG15/U3/Y (INVX1) 0.01 & 1.62 f CG15/U6/Y (NAND2X1) 0.01 & 1.63 r CG15/U4/Y (INVX1) 0.02 & 1.65 f CG15/U5/Y (INVX1) 0.08 & 1.73 r CG15/cout (carry_gen_50) 0.00 & 1.73 r CG31/c (carry_gen_34) 0.00 & 1.73 r CG31/U7/Y (NAND2X1) 0.04 & 1.77 f CG31/U2/Y (INVX1) -0.00 & 1.77 r

Figure 35

31

CG31/U3/Y (INVX1) 0.02 & 1.79 f CG31/U6/Y (NAND2X1) 0.01 & 1.80 r CG31/U4/Y (INVX1) 0.02 & 1.82 f CG31/U5/Y (INVX1) 0.02 & 1.84 r CG31/cout (carry_gen_34) 0.00 & 1.84 r CG63/c (carry_gen_2) 0.00 & 1.84 r CG63/U7/Y (NAND2X1) 0.01 & 1.86 f CG63/U2/Y (INVX1) 0.01 & 1.86 r CG63/U3/Y (INVX1) 0.02 & 1.88 f CG63/U6/Y (NAND2X1) 0.04 & 1.92 r CG63/U4/Y (INVX1) 0.04 & 1.96 f CG63/U5/Y (INVX1) 0.02 & 1.98 r CG63/cout (carry_gen_2) 0.00 & 1.98 r XOR63/b (xor_gate2_65) 0.00 & 1.98 r XOR63/U9/Y (INVX1) 0.01 & 1.99 f XOR63/U8/Y (NAND2X1) 0.02 & 2.01 r XOR63/U3/Y (INVX1) 0.02 & 2.03 f XOR63/U4/Y (INVX1) 0.01 & 2.04 r XOR63/U5/Y (NAND2X1) 0.01 & 2.05 f XOR63/c (xor_gate2_65) 0.00 & 2.05 f sum[63] (out) 0.00 & 2.05 f data arrival time 2.05 --------------------------------------------------------------- (Path is unconstrained)

Figure 35: Post APR Timing report for 64 bit Kogge Stone Adder

32

IV. RESULTS

The tables and graphs comparing Pre APR and Post APR delays are as given

below. For comparison, the areas and delays of behavioral adders are also

included in the table. It can be seen that ripple carry adders have the worst delay

among all the adders. Carry select and carry lookahead adders offer significant

improvement over the ripple carry adder. The delay of carry select adder is more

since it is made up of blocks of ripple carry adders and the carry has to

propagate sequentially within a block. Carry skip adder is slower as compared to

ripple carry adder for 8 and 16 bits but then gets faster for 32 and 64 bits. This is

because carry skip adder consists of 4 bit ripple carry adders and has carry skip

logic (which generates group generate and propagate) in between them. For 8

and 16 bits, delay of carry skip logic is larger than 4 bit ripple carry adders and

so they are slow compared to 8 and 16 bit ripple carry adders. Kogge stone adder

offer the minimum delay as it is proportional to (Log N) where N is the number

of bits in the operand. Figure 36 and 37 shows the comparison of pre and post

APR delays.

Table 1: Pre APR Delay (in ns)

8 Bit 16 Bit 32 Bit 64 Bit

Behavioral 0.86 1.73 3.41 6.84

Ripple Carry 0.67 1.34 2.67 5.35

Carry Select 0.43 0.65 1.04 1.8

Carry Skip 0.79 1.59 2.51 4.39

Carry Lookahead 0.49 0.58 0.77 0.89

Kogge-Stone 0.36 0.45 0.53 0.63

33

Figure 36: Pre APR Delay (in ns)

Table 2: Post APR Delay (in ns)


Behavioral 0.94 1.88 3.82 7.54

Ripple Carry 0.8 1.68 3.33 6.59

Carry Select 0.57 1.05 1.68 3.53

Carry Skip 1.01 2.04 3.28 5.57

Carry Lookahead 0.64 0.72 1.09 1.18

Kogge-Stone 0.45 0.64 0.88 1.05

Figure 37: Post APR Delay (in ns)

012345678

Behav Ripple Carry Carry Select Carry Skip Carry Lookahead

Kogge Stone

Del

ay (

ns)

Pre APR Delay


0

2

4

6

8

Behav Ripple Carry

Carry Select

Carry Skip Carry Lookahead

Kogge Stone

Del

ay (

ns)

Post APR Delay


34

The tables and graphs comparing Pre APR (proportional to gate count)

and Post APR areas are given below. It is evident from these that ripple carry

adders are the simplest ones to implement. It doesn’t have any fanout issues and

is regular in structure. In contrast, carry select adders need a large number of

gates since, it has two blocks of ripple carry adders performing the addition and

a multiplexer to choose the correct output. The gate count of carry skip is little

higher since it has carry skip logic (giving group generate and propagate) for

every 4 bits of ripple carry adder. Carry lookahead adder has even more gates

because it has multiple level of lookahead logic (generating group generate and

propagate). Kogge-Stone has high complexity since it has a large number of

prefix operations. Figure 38 and 39 represents the comparison of pre and post

APR areas.

Table 3: Pre APR Area (in um2)


Behavioral 370 742 1485 2972

Ripple Carry 387.6 778.1 1559 3120.8

Carry Select 739.1 1607.4 3410.9 6934.8

Carry Skip 573.5 1149.8 2263 4409.5

Carry Lookahead 566.9 1173.7 2350.3 4740.4

Kogge-Stone 520.9 1210.8 2801.7 6420

35

Figure 38: Pre APR Area (in um2)

Table 4: Post APR area (in um2)


Behavioral 3906 5184 7395 10920

Ripple Carry 8621.1 13104 21158 35800

Carry Select 15912 22032 43460 78960

Carry Skip 10948 17160 28236 47306

Carry Lookahead 10856 17420 29064 57500

Kogge-Stone 10304 17940 33300 75051

Figure 39: Post APR area (in um2)

010002000300040005000600070008000

Are

a (u

m^

2)

Pre APR Area


020000400006000080000

100000

Are

a (u

m^

2)

Post APR Area


36

As a figure of merit, the post place and route area delay product is

computed and the results are presented in Figure 40. The value is highest for the

carry select adder, owing to its area overhead and least for the behavioral adder

for lower order adders due to the extra effort the tool puts in optimizing the

area. However since it loses in terms of area, Kogge Stone adder beats it in 64 bit

case due to the cost the behavioral adder pays in terms of delay to get a more

optimal area.

Figure 40: Post APR area-delay product (in um2ns)

0

100000

200000

300000

Behav Ripple Carry

Carry Select

Carry Skip Carry Lookahead

Kogge Stone

Area Delay Product


37

V. CONCLUSIONS Among all the adders studied, ripple carry adder was found to have the

maximum delay and the minimum area. This met our expectation since ripple

carry adder is the simplest form of adder without any optimizations for

performance or hardware complexity. The delay of carry select adder and carry

lookahead adder is considerably less compared to ripple carry adders. However,

the hardware complexity of carry select adder is quite high since it uses two

blocks of ripple carry adders to perform addition and a multiplexer to choose the

correct sum and carry. Carry skip adder is slower as compared to ripple carry

adder for 8 and 16 bits but then gets faster for 32 and 64 bits. Kogge-Stone adder

was found to have the least delay. This came at the cost of hardware complexity.

This is consistent as kogge-stone has the most number of parallel prefix

operations thereby consuming large area.

38

VI. REFERENCES

• P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution

of a General Class of Recurrence Equations,” IEEE Transactions on

Computers, Vol. C-22, No. 8, pp. 783-791, August 1973

• O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic

Computers, vol. EC-11, pp. 340-346, 1962

• B. Gilchrist, J.H. Pomerene and S.Y. Wong, “Fast-carry logic for Digital

Computers,” IRE Transactions on Electronic Computers, vol. EC-4, pp.133-

136; December, 1955

• O.L. MacSorley, “High Speed Arithmetic in Binary Computers,” Proc. IRE,

vol. 49, pp. 67-91, 1961

• A. Weinberger and J. L. Smith, “A logic for high-speed addition,” National

Bureau of Standards Circular591, pp. 3-12, 1958

• N. Weste, D. Harris, CMOS VLSI Design: A Circuits and Systems

Perspective(the 4th Edition), 2011

Documents

A Comparative Study of Adders - Repository Home