Download pdf - Fft Processor Using Vedic Algo

8/13/2019 Fft Processor Using Vedic Algo

1/8

International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012

Page 102

DESIGN AND IMPLEMENTATION OF LOW

POWER AND AREA EFFICIENT ADDER AND

VEDIC MULTIPLIER FOR FFT

SILAMBARASAN. C.A.

PG Scholar

Sree Saatha Institute of Engineering and

technology, Chennai

[email protected]

L.VANITHA

Assistant Professo,

Sree Saatha Institute of Engineering and

technology, Chennai

[email protected]

ABSTRACT

The ever increasing demand in enhancing the ability of processors to handle the complex and challenging

processes Resulted in the integration of a number of processor cores into one chip. This load is reduced by

supplementing the main processor with Co-Processor. The Fast Fourier Transform (FFT) is a computationally

intensive digital signal processing (DSP) function widely used in application and the speed of FFT depends greatly on

the multiplier and adder. Vedic Mathematics is the ancient system of mathematics which has a unique technique of

calculations based on 16 Sutras. It is used for design a multiplier. Carry select adder (CSLA) is the fastest adder used

to perform an arithmetic functions. The proposed design has reduced area and power as compared with the regularAdders and Multipliers. This work evaluates the performance of the proposed designs in terms of delay, area, power.

KeywordsFFT, CSLA,Vedic multiplier, low power, area efficient.

INTRODUCTION

The increase in the popularity of portablesystems as well as the rapid growth of the powerdensity in integrated circuits have made power

dissipation one of the important design objectives,second thing area & performance. So here the carry

select adder and Vedic multiplier are modified forreducing the above factors. The Fast Fourier

Transform (FFT) is a computationally intensive digitalsignal processing (DSP) function widely used inapplications such as imaging, software-defined radio,wireless communication, instrumentation and machineinspection. The choice of FFT sizes is decided by

different operation standards. It is desirable to makethe FFT size changeable according to the operation

environment. Achieving a successful design means thesystem should be able to support different operating

modes required by diverse applications with lowpower consumption requirement.Based on the idea of

sharing two adders used in the Carry Select Adder(CSA), a new design of a low-power high performanceadder is presented. The new adder is faster than a

Ripple Carry Adder (RCA), but slower than a CSA.On the other hand, its area and power dissipation aresmaller than those of a CSA. In a typical processor,Multiplication is one of the basic arithmetic operationsand it requires substantially more hardware resourcesand processing time than addition and subtraction. Infact, 8.72% of all the instruction in typical processingunits is multipliers. In computers, a typical centralprocessing unit devotes a considerable amount ofprocessing time in implementing arithmetic


2/8


Page 103

operations, particularly multiplication operations. Inthis paper, comparative study of different multipliers isdone for low power requirement and high speed. Thepaper gives information of Urdhva Tiryakbhyamalgorithm of Ancient Indian Vedic Mathematics whichis utilized for multiplication to improve the speed, areaparameters of multipliers. Vedic Mathematics alsosuggests more formulae for multiplication i.e.Nikhilam Sutra which can increase the speed ofmultiplier by reducing the number of iterations.Thepaper gives information of Urdhva Tiryakbhyamalgorithm of Ancient Indian Vedic Mathematics whichis utilized for multiplication to improve the speed, areaparameters of multipliers. Vedic Mathematics alsosuggests more formulae for multiplication i.e.Nikhilam Sutra which can increase the speed of

multiplier by reducing the number of iterations.Increasingly huge data sets and the need for lowpower in adders tend to increase. The traditional serialadders are no longer suitable for large adders becauseof its huge area and high power. All systems tends to

trade off between speed and power. The computationtime taken by the array multiplier is comparativelyless. because the partial products are calculatedindependently in parallel. The delay associated withthe array multiplier is the time taken by the signals to

propagate through the gates that form themultiplication array. Large booth arrays are required

for high speed multiplication and exponentialoperations which in turn require large partial sum and

partial carry registers. In this paper the carry selectadder designed only 128-bit and Vedic multiplierdesigned 32bit.

I. CARRY SELECT ADDERDesign of area- and power-efficient high-

speed data path logic systems are one of the mostsubstantial areas of research in vlsi system design. indigital adders, the speed of addition is limited by the

time required to propagate a carry through the adder.the sum for each bit position in an elementary adder is

generated sequentially only after the previous bitposition has been summed and a carry propagated intothe next position. the csla is used in manycomputational systems to alleviate the problem ofcarry propagation delay by independently generatingmultiple carries and then select a carry to generate thesum . however, the csla is not area efficient because ituses multiple pairs of ripple carry adders (rca) to

generate partial sum and carry by considering carryinput cin=0 and cin=1 then the final sum and carry areelected by the multiplexers (mux).the basic idea of thiswork is to use binary to excess-1 converter(bec)instead of rca with in the regular csla to achieve lowerarea and power consumption . the main advantage ofthis bec logic comes from the lesser number of logicgates than the n-bitfull adder (fa) structure.

II. MULTIPLIER USING VEDICMATHEMATICS

Complex multiplication is of immense importance indigital signal processing (dsp) and image processing

(ip).to implement the hardware module of discretefourier transformation (dft), discrete cosine

transformation(dct), discrete sine transformation (dst)and modem broadband communications; largenumbers of complex multipliers are required. complexnumber multiplication is performed using four realnumber multiplications and two additions/subtractions. in real number processing, carry needs tobe propagated from the least significant bit (lsb) to themost significant bit (msb) when binary partial productsare added.

Figure1 4-BIT BEC

X0 = ~B0 ;X1 = B0^B1 ; X2 = B2^ (B0 & B1); X3 =B3^(B0 & B1 & B2)


3/8


Page 104

Figure 2 4-Bit BECwith 8:4 mux

Therefore, the addition and subtraction afterbinary multiplications limit the overall speed. manyalternative method had so far been proposed forcomplex number multiplication like algebraic

transformation based implementation, bit-serialmultiplication using offset binary and distributedarithmetic , the cordic (coordinate rotation digitalcomputer) algorithm , the quadratic residue numbersystem (qrns) , and recently, the redundant complex

number system (rcns) .Blahut et. al proposed a technique for

complex number multiplication, where the algebraictransformation was used. this algebraic transformationsaves one real multiplication, at the expense of threeadditions as compared to the direct methodimplementation. a left to right array for the fastmultiplication has been reported in 2005, and themethod is not further extended for complexmultiplication. but, all the above techniques requireeither large overhead for pre/post processing or longlatency. further many design issues like as speed,accuracy, design overhead, power consumption etc.,should not be addressed for fast multiplication .inalgorithmic and structural levels, a lot ofmultiplication techniques had been developed toenhance the efficiency of the multiplier; whichencounters the reduction of the partial products and/orthe methods for their partial products addition, but theprinciple behind multiplication was same in all cases.

Vedic mathematics is the ancient system ofIndian mathematics which has a unique technique ofcalculations based on 16 sutras (formulae). "urdhva-tiryakbyham" is aSanskrit word means vertically and

crosswise formula is used for smaller numbermultiplication. "nikhilamnavatascaramam dasatah"also a Sanskrit term indicating "allfrom 9 and last from10", formula is used for large numbermultiplicationand subtraction. all these formulas are adoptedfromancient Indian vedic mathematics. in this workweformulate this mathematics for designing thecomplexmultiplier architecture in transistor level withtwo clear goalsin mind such as: i) simplicity andmodularity multiplicationsfor vlsi implementationsand ii) the elimination of carrypropagation for rapidadditions and subtractions.mehta et al. have beenproposed a multiplier designusing "urdhva-tiryakbyham" sutras, which was adopted fromthevedas. the formulation using this sutra is similar tothemodem array multiplication, which also indicating

the carrypropagation issues. a multiplier design using"nikhilamnavatascaramam dasatah" sutras has beenreported by tiwariet. al in 2009, but he has notimplemented the hardwaremodule formultiplication.multiplier implementation in the gate

level (fpga) usingvedic mathematics has already beenreported but to the bestof our knowledge till date thereis no report on transistor level(asic) implementation ofsuch complex multiplier.

Byemploying the vedic mathematics, an n bit

complex numbermultiplication was transformed intofour multiplications forreal and imaginary terms of the

final product. "nikhilamnavatascaramam dasatah"sutra is used for the multiplicationpurpose, with less

number of partial products generation, incomparisonwith array based multiplication. when comparedwithexisting methods such as the direct method or

thestrength reduction technique, our approach resultednot only insimplified arithmetic operations, but also in

a regular arraylikestructure. the multiplier is fullyparameterized, so anyconfiguration of input and outputword-lengths could beelaborated. transistor levelimplementation for perfonnanceparameters such aspropagation delay, dynamic leakage powerand

dynamic switching power consumption calculation oftheproposed method was calculated by spice spectreusing 90 nmstandard cmos technology and comparedwith the otherdesign like distributed arithmetic,parallel adder basedimplementation and algebraic

transfonnation basedimplementation. the calculatedresults revealed (16,16)x(16,16) complex multiplierhave propagation delay only 4ns with 6.5 mw dynamicswitching power.In this paper we are concentratingon"urdhva-tiryakbyham", and "nikhilam


4/8


Page 105

navatascaramamdasatah" formulas and other fonnulasare beyond the scope ofthis paper.

A. "URDHVA-TIRYAKBYHAM " SUTRA

The meaning of this sutra is "vertically andcrosswise"and it is applicable to all the multiplicationoperations. Figure represents the generalmultiplication procedure of the 4x4multiplication. thisprocedure is simply known as arraymultiplicationtechnique . it is an efficient multiplicationtechniquewhen the multiplier and multiplicand lengths aresmall,but for the larger length multiplication this techniqueisnot suitable because a large amount of carrypropagationdelays are involved in these cases. toovercome this problemwe are describing nikhilam

sutra for calculating themultiplication of two largernumbers.

Figure 3 Multiplication procedure using "Urdhva-Tiryakbyham " sutra

B. NIKHILAM SUTRANikhilam Sutra means all from 9 and last from

10. It is also applicable to all cases of multiplication;it is more efficient when the numbers involved are

large. We will illustrate this Sutra by considering themultiplication of two decimal numbers (96 93)where the chosen base is 100 which is nearest to andgreater than both these two numbers. As shown in Fig.5, we write the multiplier and the multiplicand in tworows followed by the differences of each of them fromthe chosen base, i.e., their compliments. We can writetwo columns of numbers, one consisting of thenumbers to be multiplied (Column 1) and the otherconsisting of their compliments (Column 2). The

product also consists of two parts which are distributedby a vertical line right hand side of the product will beobtained by simply multiplying the numbers of theColumn 2 (74 = 28). The left hand side of the productwill be found by cross subtracting the second numberof Column 2 from the first number of Column 1 orvice versa, i.e., 96 - 7 = 89 or 93 - 4=89. The finalresult will be obtained by combining RHS and LHS(Answer = 8928).

Figure 4. Nikhilam Sutra Multiplication

III. PROPOSED FFT DESIGN

CARRY SELECT ADDER

The increase in the popularity of portable systemsas well as the rapid growth of the power density inintegrated circuits have made power dissipation one ofthe important design objectives, second area &

performance. Because adders are one of the mostwidely used components in integrated circuits,

designing efficient adders has been the goal of muchresearch in VLSI design. The saying goes that if youcan count, you can control. Addition is a fundamentaloperation for any digital system, digital signalprocessing or control system. A fast and accurate

operation of a digital system is greatly influenced bythe performance of the resident adders. Adders are alsovery important component in digital systems becauseof their extensive use in other basic digital operationssuch as subtraction, multiplication and division.


5/8


Page 106

Hence, improving performance of the digital adderwould greatly advance the execution of binaryoperations inside a circuit compromised of suchblocks. The performance of a digital circuit block isgauged by analyzing its power dissipation, layout areaand its operating speed.

Based on the idea of sharing two adders usedin the Carry Select Adder (CSA), a new design of alow-power high performance adder is presented. Thenew adder is faster than a Ripple Carry Adder (RCA),but slower than a CSA. On the other hand, its area andpower dissipation are smaller than those of a CSA.While Ripple Carry Adders (RCAs) have the mostcompact design (O(n) area) among all types of adders,they are the slowest types of adders (O (n) time). Onthe other hand, Carry Look-ahead Adders (CLAs) are

the fastest adders (O (log (n) time), but they are theworst from the area point of view (O (nlog (n)) area).Carry Select Adders (CSAs) have been considered as acompromise solution between RCAs and CLAs (O(n)

time and O (2n) area) because they offer a goodtradeoff between the compact area of RCAs and theshort delay of CLAs. As a result, some effort has beendone to improve the efficiency of this kind of adder. Infor example, an area efficient adder has been proposedwhich uses an increment circuit instead of one of thetwo adder blocks which add high bits. In this research,based on the idea of sharing the two adders that aretypically used in the CSA, a new architecture isproposed which is more compact and power efficientthan the CSA.Carry select adder (csla) is one of the fastest addersused in many data-processing processors to performfast arithmetic functions. the structure of the csla, it isclear that there is scope for reducing the area andpower consumption in the csla. the proposed csla

adder consists of a binary to excess code and it cancompute till 128 bits which replaces existing rca blockthe proposed design has reduced area and power ascompared with the regular sqrt csla.

0 0 00 C

C Sum[1

Sum[

Sum[

Sum[

Sum[

15:11 10:7 6:4 3:2 1:0

A[15

B[15

A[1

B[1

A[ B[ A[3:

B[ A[ B[

M

u

M

u

M

u

M

u

C

C

C

C

2

2222

2

2222334

4

55

681

345

C1

C3[

C6[

C10

4

15:11 10:7 6:43:21 111


6/8


Page 107

Regular 16-b SQRT CSLA.

16-BIT MODIFIED CSLA

Step 1 Step 2 Step 31 3 Result = 6 1 3 Result = 5 1 3 Result = 1

Perv.Carry = 0 Perv.Carry = 0 Prev.Carry = 0

1 2 1 2 1 26 5 6 1 5 6

Example of Urdhva Tiryakbayam Sutra

R0=A0B0

C1R1=A0B1+A1B0C2R2=C1+A0B2+A2B0+A1B1C3R3=C2+A3B0+A0B3+A1B2+A2B1C4R4=C3+A4B0+A0B4+A3B1+A1B3+A2B2 C5R5=C4+A5B0+A0B5+A4B1+A1B4+A3B2+A2B3

C6R6=C5+A6B0+A0B6+A5B1+A1B5+A4B2+A2B4 +A3B3C7R7=C6+A7B0+A0B7+A6B1+A1B6+A5B2+A2B5 +A4B3+A3B4

0 0 0 0 C

C Sum[1

Sum[

Sum[

Sum[

Sum[

15:11 10:7 6:4 3:2 1:0

6-bit 5-bit 4-bit 3-bit

A[15

B[15

A[1

B[1

A[ B[ A[3:

B[ A[ B[

Mu

Mu

Mu

Mu

C

C

C

C

2

2222

2

2222334

4

55

681

345

C1

C3[

C6[

C10

4

6 5 1


7/8


Page 108

C8R8=C7+A7B1+A1B7+A6B2+A2B6+A5B3+A3B5+A4B4C9R9=C8+A7B2+A2B7+A6B3+A3B6+A5B4 +A4B5C10R10=C9+A7B3+A3B7+A6B4+A4B6+A5B5C11R11=C10+A7B4+A4B7+A6B5+A5B6C12R12=C11+A7B5+A5B7+A6B6C13R13=C12+A7B6+A6B7C14R14=C13+A7B7C14R14R13R12R11R10R9R8R7R6R5R4R3R2R1R0 being the final product.

General Mathematical Formula

Steps involved in Urdhva Tiryakbayam Sutra

VEDIC MULTIPLIER

CONCLUSION

A simple approach is proposed in this paper to reducethe area and power of SQRT CSLA architecture. The

reduced number of gates of this work offers the great

advantage in the reduction of area and also the total

power. design of the modified 128-b SQRT CSLAproposed. The modified CSLA architecture istherefore, low area, low power, simple and efficientfor VLSI hardware implementation. Similarly themultiplier is modify for 32-bit by using vedicmultiplier

The proposed vedic multiplier is based on thevedic multiplication formulae (sutras). These

sutras have been traditionally used for themultiplication of two numbers in the decimal

number system. In this work, the same idea isapplied to the binary number system. 1. urdhva

tiryakbayam sutram2. nikhilam sutra. The proposed multiplier isused to build a complex multiplier for intensea+ib * c+id. In this paper I am working with128 bit carry select adder and 32 bit ofmultiplier

Suppose we have to multiply 12 by 13

(i)

We multiply the most significant digit1 of multiplicand vertically by most

significant digit 1 of the multiplier,get their product 1 and set it down as

the most significant part of the answer(ii) We then multiply 1 and 3, and 1 and 2

crosswise, add the two, get 5 as thesum and set it down as the middle partof the answer and We multiply 2 and3 vertically, get 6 as their product and


8/8

Page 109

put it down as the last the right handmost part of the answer. Thus 12 x 13= 156. It bears a simple extendibleform in a similar way for multi-digitmultiplication.

REFERENCES

J. A. C. bingham, multicarrier modulation for

data transmission: an idea whose time has

come, ieee communication magazine, vol. 28,

no. 5, pp. 5-14, may 1990.

J. palicot and c. roland, fft: a basic function

for a reconfigurable receiver, 10th

international conference on

telecommunications, vol. 1, pp. 898-902,

march 2003.

[1].Min jiang, bing yang, yiling fu, et al.,design of fft processor with lowpower complex mutliplier for ofdm-based high-speedwirelessapplications, international

symposium on communications andinformation technology, vol. 2, pp.639-641, oct. 2004.

[2].kai zhong, hui he, and guangxi zhu,an ultra-high speed fft processor,

international symposium on signals,circuits and systems, vol. 1, pp. 37 -

40, july 2003.

[3].hongjiang he and hui guo, therealization of fft algorithm based onfpga co-processor, secondinternational symposium on intelligent

information technology application,vol. 3, pp. 239-243, dec. 2008.. G.proakis and d. g. manolakis,introduction to digital