Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
The Report Committee for Poulami Das
Certifies that this is the approved version of the following report:
A Comparative Study of Adders
APPROVED BY
SUPERVISING COMMITTEE:
Earl E. Swartzlander
Lizy K. John
Supervisor:
A Comparative Study of Adders
by
Poulami Das, B.Tech
Report
Presented to the Faculty of the Graduate School of
The University of Texas at Austin
in Partial Fulfillment
of the Requirements
for the Degree of
Master of Science in Engineering
The University of Texas at Austin
May 2016
iii
ABSTRACT
A Comparative Study of Adders
Poulami Das, M.S.E.
The University of Texas at Austin, 2016
Supervisor: Earl Swartzlander
This report compares the area, delay, and gate count complexity of 8, 16, 32, and 64
bit versions of several types of adders. For the purpose of the study, ripple carry adders,
carry look-ahead adders, carry select adders, carry skip adders and Kogge Stone adders
were used. To fulfill the study, the adders were implemented in structural Verilog using
only 2-input NAND and NOR gates and inverters. The adders were synthesized using
Design Vision (by Synopsys). Auto Place and Route was performed using Cadence
Encounter to get the layout of the adders and then Parasitic Extraction was performed to
get the actual routing delay. Primetime was used to calculate the post synthesis and post
place and route delays. Post-PNR netlist was used to compare the area, and delay of the
different adders.
iv
TABLE OF CONTENTS
I. Introduction 1
II. Methodology and Description 3
III. Functional Verification 27
IV. Timing Analysis 28
V. Results 32
VI. Conclusion 37
VII. References 38
1
I. INTRODUCTION
The simplest form of combinational adder is the ripple-carry adder. An n-bit
ripple carry adder consists of n 1-bit full adder connected in succession with the
carry “rippling” from the least significant bit to the most significant bit. The
input carry to any stage depends on the carry out from the previous stage,
thereby making the delay of ripple-carry adders in the order of O (N). Thus the
delay increases linearly as the size of the operands increase. This becomes a
performance bottleneck as the size of the operands grows to 64 bits and 128 bits.
Carry select adders consist of two ripple carry adders and a multiplexer.
Addition of two n-bit numbers is done speculatively with two adders: one time
with the assumption that the carry in is zero, and the other assuming that the
carry is one. After the two results are calculated, the correct sum and the correct
carry are selected with the multiplexer once the correct carry of previous stage is
known. The number of bits in each carry select block can be uniform or variable
and may be optimized for performance. In the uniform case, the optimal delay
occurs for a block size of √N, where N is the size of the operands. The carry select
adders are simple but rather fast compared to the ripple carry adders having a
delay of O (√N). Carry Skip adders are built on determining if a block of some
size will propagate a delay to the next block or not.
The higher delay in the case of ripple carry adders is due to the carry chain.
Thus, in the development of different types of adders, efforts have always been
made to break the dependency on the carry chain. In carry look-ahead adders,
the carry signals are calculated in advance, based on the input signals. For any bit
position, a carry will be generated if the corresponding input bits are ‘1’ or if the
carry-in to that bit was a ‘1’ and at least one of the input bits are ‘1’. Based on
this, a recurrence relation is derived which expresses the input carry to any bit
position in terms of the relevant addend and augend bits and a lower-adder
carry. This can result in considerable gain in speed.
2
For wide adders, the delay of the carry-lookahead adders is dominated by the
delay of propagating the carry through the different “lookahead” stages. This
delay can be reduced by looking ahead across the look-ahead blocks. In general,
a multi-level tree of lookahead structures can be generated to achieve delay that
grows with (Log N).
Adders are particularly classified into serial and parallel prefix adders on a
broader range. Kogge-Stone adder is a parallel prefix adder. In simple words, it
uses parallel carry-lookahead addition to achieve even lower delay in the order
of O(Log N). It uses a technique called recursive doubling in an algorithm for
solving a large class of recurrence problems on parallel computers. Recursive
doubling involves the splitting of the computation of a function into two equally
complex sub functions whose evaluation can be performed simultaneously. The
kogge-stone adder has log 2N stages and a fanout of 2 at each stage. This comes
at the cost of many long wires that must be routed between stages.
In this report the delay and complexity of 8 bit, 16 bit, 32 bit and 64 versions
of the following types of adders were compared:
Ripple Carry
Carry Select
Carry Skip
Carry Lookahead
Kogge-Stone
3
I. METHODOLOGY AND DESCRIPTION
Structural Verilog code was written to implement all the different adders.
The designs were verified for functional correctness based on randomized
inputs. Synthesis was completed using Design Vision by Synopsys and static
timing analysis was performed using Primetime. The gate level netlist obtained
after synthesis was used for Auto Place and Route (APR) using Cadence
Encounter. Multiple iterations were performed by varying the different tool
parameters so that the layout obtained had minimum area as well as delay.
Gate Level Design
Ripple Carry Adder
Ripple carry adders are made of full-adders with the carry bit propagating
from the LSB to the MSB. A simple block diagram of a 4-bit ripple carry adder is
shown in Figure 1. A[3:0] and B[3:0] are the two operands and c0 is the input
carry to the overall adder. c1, c2, and c3 are the intermediate carry bits that are
propagated from LSB towards MSB and c4 is the final output carry of the 4 bit
adder. S[3:0] represents the sum of A and B.
Figure 1: Block Diagram of a 4-bit Ripple Carry Adder
4
A full adder module was created and instantiated n times to make n bit
ripple carry adders where n=8, 16, 32, 64 for the scope of this report. The carry
out from the ith stage was connected as an input carry to the (i+1)th stage. A
snapshot of the 8 bit ripple carry adder Verilog code is given in Figure 2 as a
reference. Since there is not much difference in the implementation of the 16 bit,
32 bit, and 64 bit adders, their Verilog implementations are skipped in this
report to eliminate redundancy. The layouts of all the four ripple carry adders
are however presented in Figure 3 to Figure 6 in the order of 8 bit, 16 bit, 32 bit
and 64 bit respectively.
Figure 2: Gate level Verilog implementation of 8-bit Ripple Carry Adder
5
Figure 3: Layout of the 8-bit Ripple Carry Adder
Figure 4: Layout of the 16-bit Ripple Carry Adder
Figure 5: Layout of the 32-bit Ripple Carry Adder
6
Figure 6: Enlarged View of the layout of the 64-bit Ripple Carry Adder
Carry Select Adder
Carry select adder is made up of blocks of ripple carry adders. Two Ripple
carry adders are used, one with input carry of 0 and other with input carry of 1,
to generate the sum and the carry out bits. Depending on the carry out from the
previous stage, the correct sum and carry out bits are chosen using a multiplexer.
The basic structure of a 16 bit carry select adder is shown in Figure 7.
Figure 7: Block Diagram of a 16-bit Carry Select Adder
7
For 16 bit carry select adder, 4 bit ripple carry adders were used as basic
blocks. The delay for carry select adder is minimized if uniform sized ripple carry
adders are used as basic blocks. For this, the size of each ripple carry adder is
√N. Hence, a 4 bit ripple carry adder module was created and multiple copies of
the same were used along with multiplexers to get a 16 bit carry select adder.
Similarly, 8 bit ripple carry adder blocks were used for 64 bit carry select adders.
For 32 bit carry select adder, 5, 5, 5, 5, 5, 7 sized blocks were used. Figure 8
shows a snapshot of the 16 bit carry select adder implementation as a
representation. The other ones are not presented in this report to avoid
redundancy. Figure 9 to 12 shows the post auto place and route layouts of the 8
bit, 16 bit, 32 bit, and 64 bit carry select adders respectively.
Figure 8
8
Figure 8: Structural Verilog implementation of a 16-bit Carry Select Adder (Block size: 4)
9
Figure 9: Layout of the 8-bit Carry Select Adder
Figure 10: Layout of the 16-bit Carry Select Adder
Figure 11: Layout of the 32-bit Carry Select Adder
10
Figure 12: Enlarged view layout of the 64-bit Carry Select Adder
Carry Skip Adder
Carry skip adders break the dependency of the carry chain by quickly
deciding if adding k bits (block size k) in ripple carry fashion will propagate a
carry or not to the next block. Whether a block will propagate a carry or not is
determined by figuring out the overall group generate and group propagate for
that block and using the input carry to the block, the input carry to the next block
can be generated. A carry signal entering a certain block can be propagated past
the block without waiting for the signal to propagate through the 4 individual
stages of the block. An example 16 bit carry skip adder with block size 4 is given
in Figure 13. A snapshot of the Verilog code for the 16 bit carry skip adder is
shown in Figure 14. Figures 15 to 18 shows the layouts of 8 bit, 16 bit, 32 bit and
64 bit carry skip adders respectively.
11
Figure 13: Block Diagram of a 4-bit Carry Skip Adder
Figure 14
12
Figure 14: Verilog implementation of a 16 bit Carry Skip adder
13
Figure 15: Layout of the 8-bit Carry Skip Adder
Figure 16: Layout of the 16-bit Carry Skip Adder
Figure 17: Layout of the 32-bit Carry Skip Adder
14
Figure 18: Enlarged view of the layout of the 64-bit Carry Skip Adder
Carry Lookahead Adder
Figure 19 below shows the structure of a 4-bit carry lookahead adder. It
consists of 4 modified full adders and a carry lookahead logic block. The carry
lookahead block generates the carry signals c1, c2 and c3 for the 4 bit adder and
also the group generate and propagate signal. The carry lookahead block
generates the carry signals and the group propagate and generate signals as
given by the equations in Figure 20. Carry out for the 4 bit adder is given by the
following equation:
Cout=Cin.P0-3 + G0-3
15
The group generate and propagate signals can be used to cascade the 4 bit
adders in order to get a 16 bit adder. This would require having 4 4-bit CLA
adders and an additional carry lookahead block as shown in the Figure 21.
Figure 19: Block Diagram of a 4-bit Carry Lookahead Adder
Figure 20: Carry Lookahead Logic Equations
16
Figure 21: Block Diagram of a 16-bit Carry Lookahead Adder
32 bit and 64 bit carry look-ahead adders can be similarly made by cascading the
different lookahead blocks as done for the 16 bit adder. Figure 22 shows the
structural Verilog implementation of a 32 bit carry lookahead adder. Figure 23
to 26 shows the layouts of 8 bit, 16 bit, 32 bit, and 64 bit carry lookahead adders
respectively.
Figure 22
17
Figure 22
18
Figure 22
19
Figure 22: Verilog implementation of a 32-bit Carry Lookahead Adder
20
Figure 23: Layout of the 8-bit Carry Lookahead Adder
Figure 24: Layout of the 16-bit Carry Lookahead Adder
Figure 25: Layout of a 32-bit Carry Lookahead Adder
21
Figure 26: Enlarged view of the layout of the 64-bit Carry Lookahead Adder
Kogge Stone Adder
The details about the generation of generate, and propagate signals are given while describing carry lookahead adders in the previous section. The idea remains the same and hence not duplicated here. Kogge stone adder is a parallel prefix adder and uses a prefix operator to implement the carry generation operation. The prefix operator used to make the tree structure implements the following equations:
(g, p)•(g’, p’) = (g + p.g’, p.p’)
As an example, the tree structure used for a 16 bit kogge stone adder is given below in Figure 27. It was similarly extended to get the structure for 32 bit and 64 bit adders.
At the end of the tree, the group generate and propagate from bit 0 to bit i, ie. Gi, Pi is obtained.
22
Figure 27. Tree Structure for 16 bit Kogge Stone Adders
The output sum and carry bits are generated using Cin and Gi, Pi with the following equations:
C-1 = Cin Ci = Gi + Pi . Cin Si = pi xor Ci-1 This is referred to as post processing. Verilog implementation of the 64 bit
Kogge Stone adder is given in Figure 28. Figure 29 to 32 shows the layouts of the
8 bit, 16 bit, 32 bit and 64 bit Kogge Stone adders respectively.
Figure 28
23
Figure 28
24
Figure 28: Verilog implementation of the 16 bit Kogge-Stone Adder
25
Figure 29: Layout of the 8 bit Kogge-Stone Adder
Figure 30: Layout of the 16 bit Kogge-Stone Adder
Figure 31: Layout of a 32 bit Kogge-Stone Adder
26
Figure 32: Enlarged view of the layout of the 64 bit Kogge-Stone Adder
27
II. FUNCTIONAL VERIFICATION
Verilog simulations were performed to verify the functional correctness of the different types of adder. The waveforms for all the 8 bit adders are given for the purpose of representation in Figure 33.
Figure 33: Waveform for 8 bit adders
28
III. TIMING ANALYSIS
After functional verification, Synopsys PrimeTime tool was used to
measure the Pre APR as well as Post APR delay of the adders. The unit used is
nanoseconds. As an example, the Pre APR timing report for the 32 bit carry
lookahead adder is given in Figure 34. Figure 35 shows an example post APR
timing report. This report is for the 64 bit Kogge Stone adder. To generate the
post APR timing an input delay of 1 ns was set. This was subtracted from the
final value to get the actual delay.
Information: Updating design information... (UID-85) **************************************** Report : timing -path full -delay max -max_paths 1 Design : carry_lookahead_32bit Version: G-2012.06-SP4 Date : Mon Apr 11 00:09:51 2016 **************************************** Operating Conditions: typical Library: gscl45nm Wire Load Model Mode: top Startpoint: a[0] (input port) Endpoint: sum[31] (output port) Path Group: (none) Path Type: max Point Incr Path ----------------------------------------------------------- input external delay 0.00 0.00 r a[0] (in) 0.00 0.00 r GP0/a (gen_prop_0) 0.00 0.00 r GP0/U2/a (xor_gate2_0) 0.00 0.00 r GP0/U2/U10/Y (NAND2X1) 0.01 0.01 f GP0/U2/U3/Y (INVX1) 0.00 0.01 r GP0/U2/U4/Y (INVX1) 0.01 0.02 f GP0/U2/U7/Y (NAND2X1) 0.02 0.05 r GP0/U2/U5/Y (INVX1) 0.02 0.07 f GP0/U2/U6/Y (INVX1) 0.01 0.08 r GP0/U2/c (xor_gate2_0) 0.00 0.08 r GP0/p (gen_prop_0) 0.00 0.08 r LA0/p[0] (lookahead_logic_0) 0.00 0.08 r LA0/U41/Y (NAND2X1) 0.01 0.09 f LA0/U11/Y (INVX1) 0.00 0.10 r LA0/U12/Y (INVX1) 0.01 0.11 f LA0/U39/Y (NOR2X1) 0.03 0.14 r LA0/U23/Y (INVX1) 0.02 0.16 f LA0/U24/Y (INVX1) 0.01 0.17 r LA0/P (lookahead_logic_0) 0.00 0.17 r LA4/p[0] (lookahead_logic_6) 0.00 0.17 r LA4/U41/Y (NAND2X1) 0.01 0.18 f LA4/U13/Y (INVX1) 0.00 0.18 r LA4/U14/Y (INVX1) 0.01 0.20 f
Figure 34
29
LA4/U39/Y (NOR2X1) 0.03 0.23 r LA4/U11/Y (INVX1) 0.02 0.24 f LA4/U12/Y (INVX1) 0.00 0.24 r LA4/P (lookahead_logic_6) 0.00 0.24 r CG0/p (carry_gen_0) 0.00 0.24 r CG0/U7/Y (NAND2X1) 0.01 0.25 f CG0/U4/Y (INVX1) 0.00 0.26 r CG0/U5/Y (INVX1) 0.01 0.27 f CG0/U6/Y (NAND2X1) 0.01 0.28 r CG0/U2/Y (INVX1) 0.02 0.30 f CG0/U3/Y (INVX1) 0.04 0.34 r CG0/cout (carry_gen_0) 0.00 0.34 r LA9/cin (lookahead_logic_1) 0.00 0.34 r LA9/U31/Y (NAND2X1) 0.02 0.36 f LA9/U30/Y (INVX1) 0.00 0.36 r LA9/U29/Y (NAND2X1) 0.01 0.37 f LA9/U7/Y (INVX1) 0.00 0.37 r LA9/U8/Y (INVX1) 0.01 0.39 f LA9/U28/Y (NAND2X1) 0.01 0.40 r LA9/U5/Y (INVX1) 0.02 0.42 f LA9/U6/Y (INVX1) 0.00 0.42 r LA9/U27/Y (NAND2X1) 0.01 0.43 f LA9/U3/Y (INVX1) 0.00 0.43 r LA9/U4/Y (INVX1) 0.01 0.44 f LA9/U26/Y (NAND2X1) 0.01 0.45 r LA9/U1/Y (INVX1) 0.02 0.48 f LA9/U2/Y (INVX1) 0.00 0.47 r LA9/U25/Y (NAND2X1) 0.01 0.49 f LA9/U19/Y (INVX1) 0.00 0.49 r LA9/U20/Y (INVX1) 0.02 0.51 f LA9/cout[2] (lookahead_logic_1) 0.00 0.51 f LA8/cin (lookahead_logic_2) 0.00 0.51 f LA8/U31/Y (NAND2X1) 0.02 0.53 r LA8/U30/Y (INVX1) 0.02 0.55 f LA8/U29/Y (NAND2X1) 0.02 0.58 r LA8/U7/Y (INVX1) 0.02 0.60 f LA8/U8/Y (INVX1) 0.00 0.60 r LA8/U28/Y (NAND2X1) 0.01 0.61 f LA8/U5/Y (INVX1) 0.00 0.61 r LA8/U6/Y (INVX1) 0.01 0.62 f LA8/U27/Y (NAND2X1) 0.01 0.63 r LA8/U3/Y (INVX1) 0.02 0.65 f LA8/U4/Y (INVX1) 0.00 0.65 r LA8/U26/Y (NAND2X1) 0.01 0.67 f LA8/U1/Y (INVX1) 0.00 0.67 r LA8/U2/Y (INVX1) 0.01 0.68 f LA8/U25/Y (NAND2X1) 0.01 0.69 r LA8/U17/Y (INVX1) 0.02 0.71 f LA8/U18/Y (INVX1) 0.00 0.72 r LA8/cout[2] (lookahead_logic_2) 0.00 0.72 r FA32/cin (full_adder_mod_1) 0.00 0.72 r FA32/U2/b (xor_gate2_1) 0.00 0.72 r FA32/U2/U9/Y (INVX1) 0.01 0.73 f FA32/U2/U8/Y (NAND2X1) 0.01 0.74 r FA32/U2/U3/Y (INVX1) 0.02 0.76 f FA32/U2/U4/Y (INVX1) 0.00 0.76 r FA32/U2/U5/Y (NAND2X1) 0.01 0.77 f FA32/U2/c (xor_gate2_1) 0.00 0.77 f FA32/sum (full_adder_mod_1) 0.00 0.77 f sum[31] (out) 0.00 0.77 f data arrival time 0.77 ----------------------------------------------------------- (Path is unconstrained)
Figure 34: Pre APR Timing report for 32 bit Carry Lookahead Adder
30
**************************************** Report : timing -path_type full -delay_type max -max_paths 1 Design : kogge_stone_adder_64bit Version: H-2012.12 Date : Tue Apr 12 14:00:51 2016 **************************************** Startpoint: b[0] (input port clocked by vclk) Endpoint: sum[63] (output port) Path Group: (none) Path Type: max Point Incr Path --------------------------------------------------------------- input external delay 1.00 1.00 f b[0] (in) 0.00 & 1.00 f GP0/b (gen_prop_0) 0.00 & 1.00 f GP0/U2/b (xor_gate2_64) 0.00 & 1.00 f GP0/U2/U11/Y (INVX1) -0.00 & 1.00 r GP0/U2/U10/Y (NAND2X1) 0.01 & 1.01 f GP0/U2/U3/Y (INVX1) 0.01 & 1.02 r GP0/U2/U4/Y (INVX1) 0.01 & 1.03 f GP0/U2/U7/Y (NAND2X1) 0.03 & 1.06 r GP0/U2/U5/Y (INVX1) 0.02 & 1.08 f GP0/U2/U6/Y (INVX1) 0.03 & 1.11 r GP0/U2/c (xor_gate2_64) 0.00 & 1.11 r GP0/p (gen_prop_0) 0.00 & 1.11 r CG0/p (carry_gen_0) 0.00 & 1.11 r CG0/U7/Y (NAND2X1) 0.01 & 1.12 f CG0/U4/Y (INVX1) 0.00 & 1.13 r CG0/U5/Y (INVX1) 0.01 & 1.14 f CG0/U6/Y (NAND2X1) 0.01 & 1.15 r CG0/U2/Y (INVX1) 0.02 & 1.17 f CG0/U3/Y (INVX1) 0.06 & 1.24 r CG0/cout (carry_gen_0) 0.00 & 1.24 r CG3/c (carry_gen_62) 0.00 & 1.24 r CG3/U7/Y (NAND2X1) 0.03 & 1.27 f CG3/U4/Y (INVX1) 0.00 & 1.27 r CG3/U5/Y (INVX1) 0.01 & 1.28 f CG3/U6/Y (NAND2X1) 0.01 & 1.30 r CG3/U2/Y (INVX1) 0.02 & 1.32 f CG3/U3/Y (INVX1) 0.06 & 1.38 r CG3/cout (carry_gen_62) 0.00 & 1.38 r CG7/c (carry_gen_58) 0.00 & 1.38 r CG7/U7/Y (NAND2X1) 0.03 & 1.41 f CG7/U2/Y (INVX1) 0.00 & 1.42 r CG7/U3/Y (INVX1) 0.02 & 1.43 f CG7/U6/Y (NAND2X1) 0.02 & 1.45 r CG7/U4/Y (INVX1) 0.02 & 1.47 f CG7/U5/Y (INVX1) 0.09 & 1.56 r CG7/cout (carry_gen_58) 0.00 & 1.56 r CG15/c (carry_gen_50) 0.00 & 1.56 r CG15/U7/Y (NAND2X1) 0.04 & 1.61 f CG15/U2/Y (INVX1) -0.00 & 1.60 r CG15/U3/Y (INVX1) 0.01 & 1.62 f CG15/U6/Y (NAND2X1) 0.01 & 1.63 r CG15/U4/Y (INVX1) 0.02 & 1.65 f CG15/U5/Y (INVX1) 0.08 & 1.73 r CG15/cout (carry_gen_50) 0.00 & 1.73 r CG31/c (carry_gen_34) 0.00 & 1.73 r CG31/U7/Y (NAND2X1) 0.04 & 1.77 f CG31/U2/Y (INVX1) -0.00 & 1.77 r
Figure 35
31
CG31/U3/Y (INVX1) 0.02 & 1.79 f CG31/U6/Y (NAND2X1) 0.01 & 1.80 r CG31/U4/Y (INVX1) 0.02 & 1.82 f CG31/U5/Y (INVX1) 0.02 & 1.84 r CG31/cout (carry_gen_34) 0.00 & 1.84 r CG63/c (carry_gen_2) 0.00 & 1.84 r CG63/U7/Y (NAND2X1) 0.01 & 1.86 f CG63/U2/Y (INVX1) 0.01 & 1.86 r CG63/U3/Y (INVX1) 0.02 & 1.88 f CG63/U6/Y (NAND2X1) 0.04 & 1.92 r CG63/U4/Y (INVX1) 0.04 & 1.96 f CG63/U5/Y (INVX1) 0.02 & 1.98 r CG63/cout (carry_gen_2) 0.00 & 1.98 r XOR63/b (xor_gate2_65) 0.00 & 1.98 r XOR63/U9/Y (INVX1) 0.01 & 1.99 f XOR63/U8/Y (NAND2X1) 0.02 & 2.01 r XOR63/U3/Y (INVX1) 0.02 & 2.03 f XOR63/U4/Y (INVX1) 0.01 & 2.04 r XOR63/U5/Y (NAND2X1) 0.01 & 2.05 f XOR63/c (xor_gate2_65) 0.00 & 2.05 f sum[63] (out) 0.00 & 2.05 f data arrival time 2.05 --------------------------------------------------------------- (Path is unconstrained)
Figure 35: Post APR Timing report for 64 bit Kogge Stone Adder
32
IV. RESULTS
The tables and graphs comparing Pre APR and Post APR delays are as given
below. For comparison, the areas and delays of behavioral adders are also
included in the table. It can be seen that ripple carry adders have the worst delay
among all the adders. Carry select and carry lookahead adders offer significant
improvement over the ripple carry adder. The delay of carry select adder is more
since it is made up of blocks of ripple carry adders and the carry has to
propagate sequentially within a block. Carry skip adder is slower as compared to
ripple carry adder for 8 and 16 bits but then gets faster for 32 and 64 bits. This is
because carry skip adder consists of 4 bit ripple carry adders and has carry skip
logic (which generates group generate and propagate) in between them. For 8
and 16 bits, delay of carry skip logic is larger than 4 bit ripple carry adders and
so they are slow compared to 8 and 16 bit ripple carry adders. Kogge stone adder
offer the minimum delay as it is proportional to (Log N) where N is the number
of bits in the operand. Figure 36 and 37 shows the comparison of pre and post
APR delays.
Table 1: Pre APR Delay (in ns)
8 Bit 16 Bit 32 Bit 64 Bit
Behavioral 0.86 1.73 3.41 6.84
Ripple Carry 0.67 1.34 2.67 5.35
Carry Select 0.43 0.65 1.04 1.8
Carry Skip 0.79 1.59 2.51 4.39
Carry Lookahead 0.49 0.58 0.77 0.89
Kogge-Stone 0.36 0.45 0.53 0.63
33
Figure 36: Pre APR Delay (in ns)
Table 2: Post APR Delay (in ns)
8 Bit 16 Bit 32 Bit 64 Bit
Behavioral 0.94 1.88 3.82 7.54
Ripple Carry 0.8 1.68 3.33 6.59
Carry Select 0.57 1.05 1.68 3.53
Carry Skip 1.01 2.04 3.28 5.57
Carry Lookahead 0.64 0.72 1.09 1.18
Kogge-Stone 0.45 0.64 0.88 1.05
Figure 37: Post APR Delay (in ns)
012345678
Behav Ripple Carry Carry Select Carry Skip Carry Lookahead
Kogge Stone
Del
ay (
ns)
Pre APR Delay
8 Bit 16 Bit 32 Bit 64 Bit
0
2
4
6
8
Behav Ripple Carry
Carry Select
Carry Skip Carry Lookahead
Kogge Stone
Del
ay (
ns)
Post APR Delay
8 Bit 16 Bit 32 Bit 64 Bit
34
The tables and graphs comparing Pre APR (proportional to gate count)
and Post APR areas are given below. It is evident from these that ripple carry
adders are the simplest ones to implement. It doesn’t have any fanout issues and
is regular in structure. In contrast, carry select adders need a large number of
gates since, it has two blocks of ripple carry adders performing the addition and
a multiplexer to choose the correct output. The gate count of carry skip is little
higher since it has carry skip logic (giving group generate and propagate) for
every 4 bits of ripple carry adder. Carry lookahead adder has even more gates
because it has multiple level of lookahead logic (generating group generate and
propagate). Kogge-Stone has high complexity since it has a large number of
prefix operations. Figure 38 and 39 represents the comparison of pre and post
APR areas.
Table 3: Pre APR Area (in um2)
8 Bit 16 Bit 32 Bit 64 Bit
Behavioral 370 742 1485 2972
Ripple Carry 387.6 778.1 1559 3120.8
Carry Select 739.1 1607.4 3410.9 6934.8
Carry Skip 573.5 1149.8 2263 4409.5
Carry Lookahead 566.9 1173.7 2350.3 4740.4
Kogge-Stone 520.9 1210.8 2801.7 6420
35
Figure 38: Pre APR Area (in um2)
Table 4: Post APR area (in um2)
8 Bit 16 Bit 32 Bit 64 Bit
Behavioral 3906 5184 7395 10920
Ripple Carry 8621.1 13104 21158 35800
Carry Select 15912 22032 43460 78960
Carry Skip 10948 17160 28236 47306
Carry Lookahead 10856 17420 29064 57500
Kogge-Stone 10304 17940 33300 75051
Figure 39: Post APR area (in um2)
010002000300040005000600070008000
Are
a (u
m^
2)
Pre APR Area
8 Bit 16 Bit 32 Bit 64 Bit
020000400006000080000
100000
Are
a (u
m^
2)
Post APR Area
8 Bit 16 Bit 32 Bit 64 Bit
36
As a figure of merit, the post place and route area delay product is
computed and the results are presented in Figure 40. The value is highest for the
carry select adder, owing to its area overhead and least for the behavioral adder
for lower order adders due to the extra effort the tool puts in optimizing the
area. However since it loses in terms of area, Kogge Stone adder beats it in 64 bit
case due to the cost the behavioral adder pays in terms of delay to get a more
optimal area.
Figure 40: Post APR area-delay product (in um2ns)
0
100000
200000
300000
Behav Ripple Carry
Carry Select
Carry Skip Carry Lookahead
Kogge Stone
Area Delay Product
8 Bit 16 Bit 32 Bit 64 Bit
37
V. CONCLUSIONS Among all the adders studied, ripple carry adder was found to have the
maximum delay and the minimum area. This met our expectation since ripple
carry adder is the simplest form of adder without any optimizations for
performance or hardware complexity. The delay of carry select adder and carry
lookahead adder is considerably less compared to ripple carry adders. However,
the hardware complexity of carry select adder is quite high since it uses two
blocks of ripple carry adders to perform addition and a multiplexer to choose the
correct sum and carry. Carry skip adder is slower as compared to ripple carry
adder for 8 and 16 bits but then gets faster for 32 and 64 bits. Kogge-Stone adder
was found to have the least delay. This came at the cost of hardware complexity.
This is consistent as kogge-stone has the most number of parallel prefix
operations thereby consuming large area.
38
VI. REFERENCES
• P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution
of a General Class of Recurrence Equations,” IEEE Transactions on
Computers, Vol. C-22, No. 8, pp. 783-791, August 1973
• O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic
Computers, vol. EC-11, pp. 340-346, 1962
• B. Gilchrist, J.H. Pomerene and S.Y. Wong, “Fast-carry logic for Digital
Computers,” IRE Transactions on Electronic Computers, vol. EC-4, pp.133-
136; December, 1955
• O.L. MacSorley, “High Speed Arithmetic in Binary Computers,” Proc. IRE,
vol. 49, pp. 67-91, 1961
• A. Weinberger and J. L. Smith, “A logic for high-speed addition,” National
Bureau of Standards Circular591, pp. 3-12, 1958
• N. Weste, D. Harris, CMOS VLSI Design: A Circuits and Systems
Perspective(the 4th Edition), 2011