8/13/2019 Fft Processor Using Vedic Algo
1/8
International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012
Page 102
DESIGN AND IMPLEMENTATION OF LOW
POWER AND AREA EFFICIENT ADDER AND
VEDIC MULTIPLIER FOR FFT
SILAMBARASAN. C.A.
PG Scholar
Sree Saatha Institute of Engineering and
technology, Chennai
L.VANITHA
Assistant Professo,
Sree Saatha Institute of Engineering and
technology, Chennai
ABSTRACT
The ever increasing demand in enhancing the ability of processors to handle the complex and challenging
processes Resulted in the integration of a number of processor cores into one chip. This load is reduced by
supplementing the main processor with Co-Processor. The Fast Fourier Transform (FFT) is a computationally
intensive digital signal processing (DSP) function widely used in application and the speed of FFT depends greatly on
the multiplier and adder. Vedic Mathematics is the ancient system of mathematics which has a unique technique of
calculations based on 16 Sutras. It is used for design a multiplier. Carry select adder (CSLA) is the fastest adder used
to perform an arithmetic functions. The proposed design has reduced area and power as compared with the regularAdders and Multipliers. This work evaluates the performance of the proposed designs in terms of delay, area, power.
KeywordsFFT, CSLA,Vedic multiplier, low power, area efficient.
INTRODUCTION
The increase in the popularity of portablesystems as well as the rapid growth of the powerdensity in integrated circuits have made power
dissipation one of the important design objectives,second thing area & performance. So here the carry
select adder and Vedic multiplier are modified forreducing the above factors. The Fast Fourier
Transform (FFT) is a computationally intensive digitalsignal processing (DSP) function widely used inapplications such as imaging, software-defined radio,wireless communication, instrumentation and machineinspection. The choice of FFT sizes is decided by
different operation standards. It is desirable to makethe FFT size changeable according to the operation
environment. Achieving a successful design means thesystem should be able to support different operating
modes required by diverse applications with lowpower consumption requirement.Based on the idea of
sharing two adders used in the Carry Select Adder(CSA), a new design of a low-power high performanceadder is presented. The new adder is faster than a
Ripple Carry Adder (RCA), but slower than a CSA.On the other hand, its area and power dissipation aresmaller than those of a CSA. In a typical processor,Multiplication is one of the basic arithmetic operationsand it requires substantially more hardware resourcesand processing time than addition and subtraction. Infact, 8.72% of all the instruction in typical processingunits is multipliers. In computers, a typical centralprocessing unit devotes a considerable amount ofprocessing time in implementing arithmetic
8/13/2019 Fft Processor Using Vedic Algo
2/8
International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012
Page 103
operations, particularly multiplication operations. Inthis paper, comparative study of different multipliers isdone for low power requirement and high speed. Thepaper gives information of Urdhva Tiryakbhyamalgorithm of Ancient Indian Vedic Mathematics whichis utilized for multiplication to improve the speed, areaparameters of multipliers. Vedic Mathematics alsosuggests more formulae for multiplication i.e.Nikhilam Sutra which can increase the speed ofmultiplier by reducing the number of iterations.Thepaper gives information of Urdhva Tiryakbhyamalgorithm of Ancient Indian Vedic Mathematics whichis utilized for multiplication to improve the speed, areaparameters of multipliers. Vedic Mathematics alsosuggests more formulae for multiplication i.e.Nikhilam Sutra which can increase the speed of
multiplier by reducing the number of iterations.Increasingly huge data sets and the need for lowpower in adders tend to increase. The traditional serialadders are no longer suitable for large adders becauseof its huge area and high power. All systems tends to
trade off between speed and power. The computationtime taken by the array multiplier is comparativelyless. because the partial products are calculatedindependently in parallel. The delay associated withthe array multiplier is the time taken by the signals to
propagate through the gates that form themultiplication array. Large booth arrays are required
for high speed multiplication and exponentialoperations which in turn require large partial sum and
partial carry registers. In this paper the carry selectadder designed only 128-bit and Vedic multiplierdesigned 32bit.
I. CARRY SELECT ADDERDesign of area- and power-efficient high-
speed data path logic systems are one of the mostsubstantial areas of research in vlsi system design. indigital adders, the speed of addition is limited by the
time required to propagate a carry through the adder.the sum for each bit position in an elementary adder is
generated sequentially only after the previous bitposition has been summed and a carry propagated intothe next position. the csla is used in manycomputational systems to alleviate the problem ofcarry propagation delay by independently generatingmultiple carries and then select a carry to generate thesum . however, the csla is not area efficient because ituses multiple pairs of ripple carry adders (rca) to
generate partial sum and carry by considering carryinput cin=0 and cin=1 then the final sum and carry areelected by the multiplexers (mux).the basic idea of thiswork is to use binary to excess-1 converter(bec)instead of rca with in the regular csla to achieve lowerarea and power consumption . the main advantage ofthis bec logic comes from the lesser number of logicgates than the n-bitfull adder (fa) structure.
II. MULTIPLIER USING VEDICMATHEMATICS
Complex multiplication is of immense importance indigital signal processing (dsp) and image processing
(ip).to implement the hardware module of discretefourier transformation (dft), discrete cosine
transformation(dct), discrete sine transformation (dst)and modem broadband communications; largenumbers of complex multipliers are required. complexnumber multiplication is performed using four realnumber multiplications and two additions/subtractions. in real number processing, carry needs tobe propagated from the least significant bit (lsb) to themost significant bit (msb) when binary partial productsare added.
Figure1 4-BIT BEC
X0 = ~B0 ;X1 = B0^B1 ; X2 = B2^ (B0 & B1); X3 =B3^(B0 & B1 & B2)
8/13/2019 Fft Processor Using Vedic Algo
3/8
International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012
Page 104
Figure 2 4-Bit BECwith 8:4 mux
Therefore, the addition and subtraction afterbinary multiplications limit the overall speed. manyalternative method had so far been proposed forcomplex number multiplication like algebraic
transformation based implementation, bit-serialmultiplication using offset binary and distributedarithmetic , the cordic (coordinate rotation digitalcomputer) algorithm , the quadratic residue numbersystem (qrns) , and recently, the redundant complex
number system (rcns) .Blahut et. al proposed a technique for
complex number multiplication, where the algebraictransformation was used. this algebraic transformationsaves one real multiplication, at the expense of threeadditions as compared to the direct methodimplementation. a left to right array for the fastmultiplication has been reported in 2005, and themethod is not further extended for complexmultiplication. but, all the above techniques requireeither large overhead for pre/post processing or longlatency. further many design issues like as speed,accuracy, design overhead, power consumption etc.,should not be addressed for fast multiplication .inalgorithmic and structural levels, a lot ofmultiplication techniques had been developed toenhance the efficiency of the multiplier; whichencounters the reduction of the partial products and/orthe methods for their partial products addition, but theprinciple behind multiplication was same in all cases.
Vedic mathematics is the ancient system ofIndian mathematics which has a unique technique ofcalculations based on 16 sutras (formulae). "urdhva-tiryakbyham" is aSanskrit word means vertically and
crosswise formula is used for smaller numbermultiplication. "nikhilamnavatascaramam dasatah"also a Sanskrit term indicating "allfrom 9 and last from10", formula is used for large numbermultiplicationand subtraction. all these formulas are adoptedfromancient Indian vedic mathematics. in this workweformulate this mathematics for designing thecomplexmultiplier architecture in transistor level withtwo clear goalsin mind such as: i) simplicity andmodularity multiplicationsfor vlsi implementationsand ii) the elimination of carrypropagation for rapidadditions and subtractions.mehta et al. have beenproposed a multiplier designusing "urdhva-tiryakbyham" sutras, which was adopted fromthevedas. the formulation using this sutra is similar tothemodem array multiplication, which also indicating
the carrypropagation issues. a multiplier design using"nikhilamnavatascaramam dasatah" sutras has beenreported by tiwariet. al in 2009, but he has notimplemented the hardwaremodule formultiplication.multiplier implementation in the gate
level (fpga) usingvedic mathematics has already beenreported but to the bestof our knowledge till date thereis no report on transistor level(asic) implementation ofsuch complex multiplier.
Byemploying the vedic mathematics, an n bit
complex numbermultiplication was transformed intofour multiplications forreal and imaginary terms of the
final product. "nikhilamnavatascaramam dasatah"sutra is used for the multiplicationpurpose, with less
number of partial products generation, incomparisonwith array based multiplication. when comparedwithexisting methods such as the direct method or
thestrength reduction technique, our approach resultednot only insimplified arithmetic operations, but also in
a regular arraylikestructure. the multiplier is fullyparameterized, so anyconfiguration of input and outputword-lengths could beelaborated. transistor levelimplementation for perfonnanceparameters such aspropagation delay, dynamic leakage powerand
dynamic switching power consumption calculation oftheproposed method was calculated by spice spectreusing 90 nmstandard cmos technology and comparedwith the otherdesign like distributed arithmetic,parallel adder basedimplementation and algebraic
transfonnation basedimplementation. the calculatedresults revealed (16,16)x(16,16) complex multiplierhave propagation delay only 4ns with 6.5 mw dynamicswitching power.In this paper we are concentratingon"urdhva-tiryakbyham", and "nikhilam
8/13/2019 Fft Processor Using Vedic Algo
4/8
International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012
Page 105
navatascaramamdasatah" formulas and other fonnulasare beyond the scope ofthis paper.
A. "URDHVA-TIRYAKBYHAM " SUTRA
The meaning of this sutra is "vertically andcrosswise"and it is applicable to all the multiplicationoperations. Figure represents the generalmultiplication procedure of the 4x4multiplication. thisprocedure is simply known as arraymultiplicationtechnique . it is an efficient multiplicationtechniquewhen the multiplier and multiplicand lengths aresmall,but for the larger length multiplication this techniqueisnot suitable because a large amount of carrypropagationdelays are involved in these cases. toovercome this problemwe are describing nikhilam
sutra for calculating themultiplication of two largernumbers.
Figure 3 Multiplication procedure using "Urdhva-Tiryakbyham " sutra
B. NIKHILAM SUTRANikhilam Sutra means all from 9 and last from
10. It is also applicable to all cases of multiplication;it is more efficient when the numbers involved are
large. We will illustrate this Sutra by considering themultiplication of two decimal numbers (96 93)where the chosen base is 100 which is nearest to andgreater than both these two numbers. As shown in Fig.5, we write the multiplier and the multiplicand in tworows followed by the differences of each of them fromthe chosen base, i.e., their compliments. We can writetwo columns of numbers, one consisting of thenumbers to be multiplied (Column 1) and the otherconsisting of their compliments (Column 2). The
product also consists of two parts which are distributedby a vertical line right hand side of the product will beobtained by simply multiplying the numbers of theColumn 2 (74 = 28). The left hand side of the productwill be found by cross subtracting the second numberof Column 2 from the first number of Column 1 orvice versa, i.e., 96 - 7 = 89 or 93 - 4=89. The finalresult will be obtained by combining RHS and LHS(Answer = 8928).
Figure 4. Nikhilam Sutra Multiplication
III. PROPOSED FFT DESIGN
CARRY SELECT ADDER
The increase in the popularity of portable systemsas well as the rapid growth of the power density inintegrated circuits have made power dissipation one ofthe important design objectives, second area &
performance. Because adders are one of the mostwidely used components in integrated circuits,
designing efficient adders has been the goal of muchresearch in VLSI design. The saying goes that if youcan count, you can control. Addition is a fundamentaloperation for any digital system, digital signalprocessing or control system. A fast and accurate
operation of a digital system is greatly influenced bythe performance of the resident adders. Adders are alsovery important component in digital systems becauseof their extensive use in other basic digital operationssuch as subtraction, multiplication and division.
8/13/2019 Fft Processor Using Vedic Algo
5/8
International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012
Page 106
Hence, improving performance of the digital adderwould greatly advance the execution of binaryoperations inside a circuit compromised of suchblocks. The performance of a digital circuit block isgauged by analyzing its power dissipation, layout areaand its operating speed.
Based on the idea of sharing two adders usedin the Carry Select Adder (CSA), a new design of alow-power high performance adder is presented. Thenew adder is faster than a Ripple Carry Adder (RCA),but slower than a CSA. On the other hand, its area andpower dissipation are smaller than those of a CSA.While Ripple Carry Adders (RCAs) have the mostcompact design (O(n) area) among all types of adders,they are the slowest types of adders (O (n) time). Onthe other hand, Carry Look-ahead Adders (CLAs) are
the fastest adders (O (log (n) time), but they are theworst from the area point of view (O (nlog (n)) area).Carry Select Adders (CSAs) have been considered as acompromise solution between RCAs and CLAs (O(n)
time and O (2n) area) because they offer a goodtradeoff between the compact area of RCAs and theshort delay of CLAs. As a result, some effort has beendone to improve the efficiency of this kind of adder. Infor example, an area efficient adder has been proposedwhich uses an increment circuit instead of one of thetwo adder blocks which add high bits. In this research,based on the idea of sharing the two adders that aretypically used in the CSA, a new architecture isproposed which is more compact and power efficientthan the CSA.Carry select adder (csla) is one of the fastest addersused in many data-processing processors to performfast arithmetic functions. the structure of the csla, it isclear that there is scope for reducing the area andpower consumption in the csla. the proposed csla
adder consists of a binary to excess code and it cancompute till 128 bits which replaces existing rca blockthe proposed design has reduced area and power ascompared with the regular sqrt csla.
0 0 00 C
C Sum[1
Sum[
Sum[
Sum[
Sum[
15:11 10:7 6:4 3:2 1:0
A[15
B[15
A[1
B[1
A[ B[ A[3:
B[ A[ B[
M
u
M
u
M
u
M
u
C
C
C
C
2
2222
2
2222334
4
55
681
345
C1
C3[
C6[
C10
4
15:11 10:7 6:43:21 111
8/13/2019 Fft Processor Using Vedic Algo
6/8
International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012
Page 107
Regular 16-b SQRT CSLA.
16-BIT MODIFIED CSLA
Step 1 Step 2 Step 31 3 Result = 6 1 3 Result = 5 1 3 Result = 1
Perv.Carry = 0 Perv.Carry = 0 Prev.Carry = 0
1 2 1 2 1 26 5 6 1 5 6
Example of Urdhva Tiryakbayam Sutra
R0=A0B0
C1R1=A0B1+A1B0C2R2=C1+A0B2+A2B0+A1B1C3R3=C2+A3B0+A0B3+A1B2+A2B1C4R4=C3+A4B0+A0B4+A3B1+A1B3+A2B2 C5R5=C4+A5B0+A0B5+A4B1+A1B4+A3B2+A2B3
C6R6=C5+A6B0+A0B6+A5B1+A1B5+A4B2+A2B4 +A3B3C7R7=C6+A7B0+A0B7+A6B1+A1B6+A5B2+A2B5 +A4B3+A3B4
0 0 0 0 C
C Sum[1
Sum[
Sum[
Sum[
Sum[
15:11 10:7 6:4 3:2 1:0
6-bit 5-bit 4-bit 3-bit
A[15
B[15
A[1
B[1
A[ B[ A[3:
B[ A[ B[
Mu
Mu
Mu
Mu
C
C
C
C
2
2222
2
2222334
4
55
681
345
C1
C3[
C6[
C10
4
6 5 1
8/13/2019 Fft Processor Using Vedic Algo
7/8
International Journal of Communications and EngineeringVolume 01No.1, Issue: 01March2012
Page 108
C8R8=C7+A7B1+A1B7+A6B2+A2B6+A5B3+A3B5+A4B4C9R9=C8+A7B2+A2B7+A6B3+A3B6+A5B4 +A4B5C10R10=C9+A7B3+A3B7+A6B4+A4B6+A5B5C11R11=C10+A7B4+A4B7+A6B5+A5B6C12R12=C11+A7B5+A5B7+A6B6C13R13=C12+A7B6+A6B7C14R14=C13+A7B7C14R14R13R12R11R10R9R8R7R6R5R4R3R2R1R0 being the final product.
General Mathematical Formula
Steps involved in Urdhva Tiryakbayam Sutra
VEDIC MULTIPLIER
CONCLUSION
A simple approach is proposed in this paper to reducethe area and power of SQRT CSLA architecture. The
reduced number of gates of this work offers the great
advantage in the reduction of area and also the total
power. design of the modified 128-b SQRT CSLAproposed. The modified CSLA architecture istherefore, low area, low power, simple and efficientfor VLSI hardware implementation. Similarly themultiplier is modify for 32-bit by using vedicmultiplier
The proposed vedic multiplier is based on thevedic multiplication formulae (sutras). These
sutras have been traditionally used for themultiplication of two numbers in the decimal
number system. In this work, the same idea isapplied to the binary number system. 1. urdhva
tiryakbayam sutram2. nikhilam sutra. The proposed multiplier isused to build a complex multiplier for intensea+ib * c+id. In this paper I am working with128 bit carry select adder and 32 bit ofmultiplier
Suppose we have to multiply 12 by 13
(i)
We multiply the most significant digit1 of multiplicand vertically by most
significant digit 1 of the multiplier,get their product 1 and set it down as
the most significant part of the answer(ii) We then multiply 1 and 3, and 1 and 2
crosswise, add the two, get 5 as thesum and set it down as the middle partof the answer and We multiply 2 and3 vertically, get 6 as their product and
8/13/2019 Fft Processor Using Vedic Algo
8/8
Page 109
put it down as the last the right handmost part of the answer. Thus 12 x 13= 156. It bears a simple extendibleform in a similar way for multi-digitmultiplication.
REFERENCES
J. A. C. bingham, multicarrier modulation for
data transmission: an idea whose time has
come, ieee communication magazine, vol. 28,
no. 5, pp. 5-14, may 1990.
J. palicot and c. roland, fft: a basic function
for a reconfigurable receiver, 10th
international conference on
telecommunications, vol. 1, pp. 898-902,
march 2003.
[1].Min jiang, bing yang, yiling fu, et al.,design of fft processor with lowpower complex mutliplier for ofdm-based high-speedwirelessapplications, international
symposium on communications andinformation technology, vol. 2, pp.639-641, oct. 2004.
[2].kai zhong, hui he, and guangxi zhu,an ultra-high speed fft processor,
international symposium on signals,circuits and systems, vol. 1, pp. 37 -
40, july 2003.
[3].hongjiang he and hui guo, therealization of fft algorithm based onfpga co-processor, secondinternational symposium on intelligent
information technology application,vol. 3, pp. 239-243, dec. 2008.. G.proakis and d. g. manolakis,introduction to digital