Docu

High speed Modified Booth Encoder multiplier for signed and unsigned numbers

1. INTRODUCTION

In digital computing systems multiplication is an arithmetic operation. The multiplication

operation consists of producing partial products and then adding these partial products the

final product is obtained. Thus the speed of the multiplier depends on the number of partial

product and the speed of the adder. Since the multipliers have a significant impact on the

performance of the entire system, many high performance algorithms and architectures have

been proposed. The very high speed and dedicated multipliers are used in pipeline and vector

computers. The high speed Booth multipliers and pipelined Booth multipliers are used for

digital signal processing (DSP) applications such as for multimedia and communication

systems . High speed DSP computation applications such as Fast Fourier transform (FFT)

require additions and multiplications.

Multiplication is a basic arithmetic operation that is important in microprocessors,

digital signal processing, and other modern electronic machines. VLSI Designers have

recognized this importance by dedicating significant resources and area to integer and

floating point multipliers. As a result, it is desirable to reduce the cost of these multipliers by

using efficient algorithms that do not compromise performance

Dept of ECE,M.Tech(VLSI),SITAMS Page 1


2. BRIEF LITERATURE REVIEW

. This report describes an extension to signed and unsigned modified booth’s

encoder(SUMBE) for binary multiplication that reduces the cost of such high performance

multipliers.

To explain this scheme, background material on existing algorithms is presented.

These algorithms are then extended to produce the new method. Since the method is

somewhat complex, this report will attempt to stay away from implementation details, but

instead concentrate on the algorithms in a hardware independent manner.

Multiplication is more complicated than addition, being implemented by shifting as well as

addition.

Multiplication: Partial products generation + accumulation

Because of the partial products involved in most multiplication algorithms, more

time and more circuit area is required to compute, allocate, and sum the partial products to

obtain the multiplication result.

In order that the basic algorithms are not obscured with small details, unsigned

multiplication only will be considered here, but the algorithms presented are easily

generalized to deal with signed numbers.

Multiplication consists of three major steps:

1) recoding and generating partial products;

2) reducing the partial products by partial product reduction schemes (e.g., Wallace

tree to two rows;

3) adding the remaining two rows of partial products by using a carry-propagate

adder (e.g., carry look ahead adder) to obtain the final product. There are already

many techniques developed in the past years for these three steps to improve the

performance of multipliers.



Fig 1:Types of digital multiplier

Types of high-speed multipliers:

Sequential multiplier - generates partial products sequentially and adds each

newly generated product to previously accumulated partial product

Parallel multiplier - generates partial products in parallel, accumulates

using a fast multi-operand adder

2.1TREE MULTIPLIERS

2.1.1 WALLACE TREE- Reduce the number of operands at the earliest opportunity.

If there are m dots in a column, apply m/3 full adders to that column. Tends to

minimize overall delay by making the final CPA as short as possible.

2.1.2 DADDA TREE -Reduce the number of operands in the tree to the next lower

n(h) number in the table using the fewest FA’s and HA’s possible. Reduces the

hardware cost without increasing the number of levels in the tree.



2.2UNSIGNED MULTIPLIERS

2.2.1 ARRAY MULTIPLIER

Array multiplier is well known due to its regular structure. Multiplier circuit is based

on add and shift algorithm. Each partial product is generated by the multiplication of the

multiplicand with one multiplier bit. The partial product are shifted according to their bit

orders and then added. The addition can be performed with normal carry propagate adder. N-1

adders are required where N is the multiplier length.

o No separate circuits for generation and accumulation

o Reduced execution time but increased hardware complexity

Fig 2: 4x4 Array Multiplier

2.2.2 BRAN MULTIPLIER

It is a simple parallel multiplier generally called as carry save array multiplier. It

has been restricted to perform signed bits. The structure consists of array of AND gates and

adders arranged in the iterative manner and no need of logic registers. This can be called as

non –addictive multipliers. Architecture: An n*n bit Braun multiplier is constructed with n

(n-1) adders and n^2 AND gates as shown in the fig., where,

The internal structure of the full adder can be realized using FPGA. Each products

can be generated in parallel with the AND gates. Each partial product can be added with

the sum of partial product which has previously produced by using the row of adders. The



carry out will be shifted one bit to the left or right and then it will be added to the sum

which is generated by the first adder and the newly generated partial product.

Fig 3:Braun Array

The shifting would carry out with the help of Carry Save Adder (CSA) and the Ripple

carry adder should be used for the final stage of the output. Braun multiplier performs well

for the unsigned operands that are less than 16 bits in terms of speed, power and area. But

it is simple structure when compared to the other multipliers. The main drawback of this

multiplier is that the potential susceptibility of Glitching problem due to the Ripple Carry

Adder in the last stage. The delay depends on the delay of the Full Adder and also a final

adder in the last. The power and area can also be reduced by using two bypassing

techniques called Row bypassing technique and Column bypassing technique

2.3 SIGNED MULTIPLIERS

2.3.1 BAUGH WOOLEY ALGORITHM

The Baugh-Wooley algorithm is a different scheme for signed

multiplication, but is not so widely adopted because it may be complicated to deploy on

irregular reduction trees. We use the Baugh-Wooley algorithm in our High Performance

Multiplier (HPM) tree, which combines a regular layout with a logarithmic logic depth.

The Baugh-Wooley (BW) algorithm [8] is a relatively straightforward way of doing signed

multiplications; Fig. 5 illustrates the algorithm for an 8-bit case, where the partial-product



bits have been reorganized according to Hatamian’s scheme [9]. The creation of the

reorganized partial-product array comprises three steps:

iii) The most significant bit (MSB) of the first N-1 partial-product rows and all bits of

the last partial-product row, except its MSB, are inverted.

ii) A ’1’ is added to the Nth column.

iii) The MSB of the final result is inverted.

Fig 4: Illustration of an 8-bit Baugh-Wooley multiplication

2.3.2 BOOTHS ALGORITHM

The Booth recoding algorithm is the most frequently used method to generate partial

products . This algorithm allows for the reduction of the number of partial products to be

compressed in a carry-save adder tree. Thus the compression speed can be enhanced. This

Booth–Mac Sorley algorithm is simply called the Booth algorithm, and the two-bit

recoding using this algorithm scans a triplet of bits to reduce the number of partial products

by roughly one half.

The 2-bit recoding means that the multiplier B is divided into groups of two bits, and

the algorithm is applied to this group of divided bits. The Booth algorithm is implemented

into two steps: Booth encoding and Booth selecting. The both encoding step is to generate

one of the five values from the adjacent three bits. The booth selector generates a partial

product bit by utilizing the output signals.



Table 1:Booth’s encoding.

Disadvantages:

1. Sign-extension for negative multiplicands not applicable for negative multipliers

2. long sequences of 1’s in the multiplier large number of summands

Solution for problem 1.: in case of a negative multiplier, negate both operands by applying

the 2’s complement

Solution for problem 2.:analyze groups of 1’s in the multiplier and replace them by a

shorter and more efficient representation (→Booth algorithm)

2.3.3 MODIFIED BOOTH ALGORITHEM

The original Booth algorithm for the multiplication s = x*y recodes partial

products by considering two bits at a time of one of the operands (x) and encoding them

into (-2,-1,0,1,2). Each such encoded higher-radix number is subsequently multiplied with

the second operand (y), yielding one row of recoded partial-product bits. The advantage of

Booth recoding is that the number of recoded partial products is fewer than the number of

un-recoded partial products, and this can be translated into higher performance in the

circuitry, which reduces all partial-product bits into the final product.

The drawback of the original Booth algorithm is that the number of recoded partial

products depends on input operand x, which makes this algorithm unsuitable for

implementation in hardware. The modified-Booth (MB) algorithm by MacSorley remedies

this by looking at three bits at a time of operand x.



Table 2:Modified booth’s encoding

Then we are guaranteed that only half the number of partial products will be

generated, compared to a conventional partial-product generation using 2-input AND

gates. Since it has a fixed number of partial products, the MB algorithm is suitable for

hardware implementation. An MB multiplier works internally with a two’s complement

representation of the partial products, in order to be able to multiply the encoded (-1,-2)

with the y operand. To avoid having to sign extend the partial products, we are using the

scheme presented by Fadavi-Ardekani In the two’s complement representation, a change of

sign includes the insertion of a ’1’ at the least significant bit (LSB) position (henceforth

called LSB insertion). To avoid an irregular implementation of the partial-product

reduction circuitry, we draw on the idea called modified partial-product array .

Here, the impact of LSB insertion on the two least significant bits positions of the

partial product is pre-computed. The pre-computation redefines the LSB of the partial

product (Eq. 1) and moves the potential ’1’, which results from the LSB insertion, to the

second least significant position (Eq. 2). Note that our Eq. 2 is different from the

corresponding equation used in illustrates an 8-bit MB multiplication using sign extension

prevention and the modified partial-product array scheme.

Fig 5: Illustration of an 8-bit modified-Booth multiplication.



3. PROBLEM DEFINITION

The papers presents a design methodology for high speed Booth encoded parallel

multiplier. For partial product generation, a new Modified Booth encoding (MBE) scheme is

used to improve the performance of traditional MBE schemes. But this multiplier is only for

signed number multiplication operation

The conventional modified Booth encoding (MBE) generates an irregular partial

product array because of the extra partial product bit at the least significant bit position of

each partial product row. Therefore papers presents a simple approach to generate a regular

partial product array with fewer partial product rows and negligible overhead, thereby

lowering the complexity of partial product reduction and reducing the area, delay, and power

of MBE multipliers. But the drawback of this multiplier is that it functions only for signed

number operands.

The modified-Booth algorithm is extensively used for high-speed multiplier

circuits. Once, when array multipliers were used, the reduced number of generated partial

products significantly improved multiplier performance. In designs based on reduction trees

with logarithmic logic depth, however, the reduced number of partial products has a limited

impact on overall performance. The Baugh-Wooley algorithm is a different scheme for

signed multiplication, but is not so widely adopted because it may be complicated to deploy

on irregular reduction trees. Again the Baugh-Wooley algorithm is for only signed number

multiplication.

The array multipliers and Braun array multipliers operates only on the unsigned

numbers. Thus, the requirement of the modern computer system is a dedicated and very high

speed multiplier unit that can perform multiplication operation on signed as well as unsigned

numbers. In this paper we designed and implemented a dedicated multiplier unit that can

perform multiplication operation on both signed and unsigned numbers, and this multiplier is

called as SUMBE multiplier.



4. METHODOLOGY

The new MBE recoder was designed according to the following analysis. Table 1 presents

the truth table of the new encoding scheme. The Z signal makes the output zero to

compensate the incorrect X2_b and Neg signals presents the circuit diagram of the encoder

and decoder. The encoder generates X1_b, X2_b, and Z signals by encoding the three x-

signals. The yLSB signal is the LSB of the y signal and is combined with x-signals to

determine the Row_LSB and the Neg_cin signals. Similarly, yMSB is combined with x-

signals to determine the sign extension signals. shows an overview of the partial product

array for an 8 × 8 multiplier. The sign extension circuitry developed . The conventional MBE

partial product array has two drawbacks:

1) an additional partial product term at the (n-2)th bit position;

2) poor performance at the LSB-part.

To remedy the two drawbacks, the LSB part of the partial product array is

modified. Referring to Fig. 2a, the Row_LSB (gray circle) and the Neg_cin terms are

combined and further simplified using Boolean minimization. The new equations for the

Row_LSB and Neg_cin can be written as , respectively.

Table 3: Truth Table of MBE Scheme.



Fig 6: MBE encoder Fig 7:MBE decoder

Fig 8: 8× 8 MBE partial product array. (a) Traditional MBE partialproductarray. (b) New MBE partial product array.



The Fig. 8(a) has widely been adopted in parallel multipliers since it can reduce the

number of partial product rows to be added by half, thus reducing the size and

enhancing the speed of the reduction tree. However, as shown in Fig.7,8, the conventional

MBE algorithm generates n/2 + 1 partial product rows rather than n/2 due to

the extra partial product bit (neg bit) at the least significant bit position of each partial

product row for negative encoding, leading to an irregular partial product array and a

complex reduction tree. Therefore, the Modified Booth multipliers with a regular partial

product array produce a very regular partial product array. No only each negi is shifted left

and replaced by ci but also the last neg bit is removed. This approach reduces the partial

product rows from n/2 + 1 to n/2 by incorporating the last neg bit into the sign extension

bits of the first partial product row, and almost no overhead is introduced to the partial

product generator. More regular partial product array and fewer partial product rows result

in a small and fast reduction tree, so that the area, delay, and power of MBE multipliers

can further be reduced.



5. SYSTEM REQUIREMENTS

i) Hardware requirements : Spartan FPGA Embedded kit

ii) Software requirements : Xililnx



6. REFERENCES

[1] W. –C. Yeh and C. –W. Jen, “High Speed Booth encoded Parallel Multiplier

Design,” IEEE transactions on computers, vol. 49, no. 7, pp. 692-701, July 2000.

[2] Shiann-Rong Kuang, Jiun-Ping Wang, and Cang-Yuan Guo, “Modified Booth

multipliers with a Regular Partial Product Array,” IEEE Transactions on circuits and

systems-II, vol 56, No 5, May 2009.

[3] Li-Rong Wang, Shyh-Jye Jou and Chung-Len Lee, “A well-tructured Modified

Booth Multiplier Design” 978-1-4244-1617-2/08/$25.00 ©2008 IEEE.

[4] Soojin Kim and Kyeongsoon Cho “Design of High-speed Modified Booth

Multipliers Operating at GHz Ranges” World Academy of Science, Engineering and

Technology2010.

[5] Magnus Sjalander and Per Larson-Edefors. “The Case for HPM-Based Baugh-

Wooley Multipliers,” Chalmers University of Technology, Sweden, March 2008.

[6] Z Haung and M D Ercegovac, “High performance Low Power left to right array

multiplier design” IEEE trans.Computer, vol 54 no3, page 272-283 Mar 2005.

[7] Hsing-Chung Liang and Pao-Hsin Huang, “Testing Transition Delay Faults in

Modified BoothMultipliers by Using C-testable and SIC Patterns”IEEE2007, 1-4244-

1272-2/07.

[8] Aswathy Sudhakar, and D. Gokila, “Run-Time Reconfigurable Pipelined

Modified Baugh-Wooley Multipliers,” Advances in Computational Sciences and

Technology ISSN 0973-6107 Volume 3 Number 2 (2010) pp. 223–235.

[9] Myoung-Cheol Shin, Se-Hyeon Kang, and In-Cheol Park, “An Area-Efficient

Iterative Modified-Booth Multiplier Based on Self-Timed Clocking,” Industry, and Energy

through the project System IC 2010,and by IC Design Education Center (IDEC).

[10] Leandro Z. Pieper, Eduardo A. C. da Costa, Sérgio J. M. de Almeida,

“Efficient Dedicated Multiplication Blocks for2´s Complement Radix-2m Array

Multipliers,” JOURNAL OF COMPUTERS, VOL. 5, NO. 10, OCTOBER 2010.


Documents

Docu