Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
ABSTRACT Convolution is a mathematical operation in signal processing applications which is used to predict the response of the system for a givenimpulse response. Focus on this area is
stressed as it has various applications on fields like Digital Signal Processing (DSP), Digital Image Processing,Linear Acoustics, andStatistics.A high speed DSP system is therefore
required to perform the computational process of convolution in an effective manner. In this paper a detailed analysis and implementation has been carried out for linear convolution in
which the architecture of Vedic multiplication is used to enhance the computational speed of convolution operation.The architecture was simulated using ISim andsynthesized using Xilinx synthesis technology. The functional block has been successfully implemented in
hardware using Xilinx Spartan 6 XC6SLX45-2CSG324 Field-Programmable Gate Array (FPGA). Finally, the output waveforms from the FPGA were displayed on Chip scope VIO
console logic analysis for real-time verification.
Keywords: Linear Convolution,DSP,Vedic Multiplication, VHDL, Signal Processing, FFT.
INVESTIGATION AND VLSI IMPLEMENTATION OF LINEAR
CONVOLUTION ARCHITECTURE FOR FPGA BASED SIGNAL
PROCESSING APPLICATIONS
S.Elango1, P.Sampath
2, K. Shoukath Ali
3, Sajan P Philip
4,A.Daniel Raj
5
1Assistant Professor , Department of Electronics and Communication Engineering, Bannari Amman
Instituteof Technology, Sathyamangalam.Erode-638-401, Tamilnadu, India.
2Professor, Department of Electronics and Communication Engineering, Bannari Amman Instituteof
Technology, Sathyamangalam. Erode-638-401, Tamilnadu, India.
3Assistant Professor , Department of Electronics and Communication Engineering, Bannari Amman
Instituteof Technology, Sathyamangalam. Erode-638-401, Tamilnadu, India.
4Assistant Professor , Department of Electronics and Communication Engineering, Bannari Amman
Instituteof Technology, Sathyamangalam. Erode-638-401, Tamilnadu, India.
5Assistant Professor , Department of Electronics and Communication Engineering, Bannari Amman
Instituteof Technology, Sathyamangalam. Erode-638-401, Tamilnadu, India.
International Journal of Pure and Applied MathematicsVolume 119 No. 16 2018, 4607-4624ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
4607
2
I.INTRODUCTION
A mathematical way of combining two signals to obtain a new signal is known as convolution. It is the single most important technique in DSP. Three signals play a vital role
in order to perform convolution i.e., the input signal (x[n]), the output signal (y[n]), and the impulse response (h[n]) [1].Itfinds its significance in the fields of Fourier Theory ,analysis
of linear systems and it is fundamental to many common image processing operators.Impulse function is mainly used for convolution in discrete cases. In linear time
invariant systems,a product of input and impulse signals known as convolution. The product of the two signalsstick to the commutative property of algebra [2],[3].It provides a way of
"multiplying together'' two arrays of numbers of different sizes with the same dimensionality, to produce a third array of numbers without the change in the
dimensionality.
Let’s consider two sequences m(x) and h(x), where m(x) is the input sequence,h(x) is the impulse response.The output response of the system g(x) is computed as follows:
g(x) = m(x) * h(x) (1)
Where * represents convolution.
The most widely used types of convolution are linear and circular. Linear convolution is preferred for finite length sequences of any length (for implementation only positive length
sequences are considered). FPGA design is preferred as it employs very fast inputs, outputs (IOs) and bidirectional data buses which are used to verify correct time of valid data.Linear
convolution solely depends upon the process of multiplication which is performed using multiplier.
The rest of this paper is structured as follows: Section II provides literature survey of multipliers and linear convolution. Section III describes architecture of Vedic multiplier.
Section IV describes high speed architecture of linear convolution. Simulation, Synthesized and Implementation resultswill be discussed in Section V. Conclusions are drawn in Section
VI.
II. LITERATURE SURVEY
The basic form of binary multiplier is an Array multiplier it [18] involves computing a set of partial products, and then summing the partial products together to get the final output.
Baugh and wooley reported an array architecture based signed multiplier [19] using the concept of 2’s complement. The Partial Product (PP) generation of Wallace tree [15]
multiplier is similar to array multiplier, the difference in PP addition. An n bit multiplier have an n rows of PP, rows are grouped into 3*|n/3|, remaining n mod 3 rows are passed
into next stage. A further improvement has made in PP addition by Dadda [17] but it is less regular than Wallace multiplier as mentioned in [16]. Vedic Mathematics is an ancient technique that allows efficient implementation of arithmetic rules for high speed
International Journal of Pure and Applied Mathematics Special Issue
4608
3
computationswhich can be applied to various branches of engineering[8]. There are 14
Vedic sutras available for mathematical calculation among that only two sutras supports for multiplication namely UrdhvaTiryakbhyam (UT) Sutra and Nikhilam Sutra for unsigned
and signed input format respectively [4]. The following literature survey represents a linear convolution based on Vedic sutras. The reason for choosing Vedic multiplier is illustrated in
Table I. JubinHazra et al.,[3] have discussed about the convolution calculation of two finite length sequence. In Reference [13] they have used Nikhilam Sutra based convolution the work was concentrated on reducing leakage current by using multiple channel CMOS
technique. In [12] AsmitaHaveliyaet alis the author who have implemented convolution based on Vedic Mathematics. In [14], Rashmi Rahul Cull Carni have provide the Parallel
architecture for Overlap-Add method (OLA) and Overlap-Save method (OLS)in order to find convolution for long sequencebased on Vedic multiplier.The authors are concentrated
on hardware implementation but failed to do detailed analysis for increased number of bits. So in this paper focused on a detailed analysis for four and eight points with different bit
widths purposely for FPGA based DSP applications.Since only few researches have been reported in the area of linear convolution, it is implemented in this paper using Vedic
multiplication.
III. VEDIC MULTIPLICATION A. UrdhvaTiryakbhyam(UT) Sutra
UT sutra is a multiplication method which is applicable to all cases of multiplication. The other name for UT sutra is “Vertically and Crosswise” [7].The Multiplier based on this
sutra has the advantage that as the number of bits increases, gate delay and area increases very slowly as compared to other conventional multipliers.
B. 8-BIT VEDIC MULTIPLIER
The 4x4 Vedic multiplier is the basic building block for design of an 8x8 Vedic multiplier as shown in Figure1 [8]. Let’s consider a pair of 8-bit numbers as a=a7a6a5a4a3a2a1a0 and b=
b7b6b5b4b3b2b1b0. The resultant 16-bitproduct is S (15 down to 0),is obtained after performing partial product addition using Ripple carry Adders. [10]
International Journal of Pure and Applied Mathematics Special Issue
4609
4
Figure1.Block Diagram of 8-bit Vedic Multiplier
C. 16-BIT VEDIC MULTIPLIER
The 8x8 Vedic multiplier is the basic building block for design of a 16x16 Vedic multiplier as shown in Figure2.Let’s consider a pair of 16-bit numbers as
a=a15a14a13a12a11a10a9a8a7a6a5a4a3a2a1a0and b=b15b14b13b12b11b10b9b8b7b6b5b4b3b2b1b0.The resultantproduct is 32-bit– S(31 down to 0), is obtained after performing after performing partial product addition using Ripple carry Adders. [10]
International Journal of Pure and Applied Mathematics Special Issue
4610
5
Figure2. Block Diagram of 16-bit Vedic Multiplier
IV. HIGH SPEED LINEAR CONVOLUTION ARCHITECTURE The multiplier which meets the requirement ofhigh speed multiplication is Vedic
multiplier(UrdhvaTiryakbhyam Sutra) [4],[5] when compared with other conventional multiplier [6] as depicted in Table 1.Figure3 shows the linear convolution of two n-point
sequences of length n1 and n2 namely x(n)= {a0,a1,….,an}and h(n) = {b0,b1,….,bn}.The output sequence is y(n)={y0,y1,…yn}.The length of the output sequence of an n-bit is
l=n1+n2-1.
International Journal of Pure and Applied Mathematics Special Issue
4611
6
TABLE1. COMPARISON OF ARRAY MULTIPLIER WITH VEDIC MULTIPLIER
TABLE2. VEDIC MULTIPLIER
Multiplier Bits
Xilinx Spartan 6 XC6SLX45-2CSG324 Altera Cyclone II EP2C70F896I8
No. of Slice
LUTs Required Delay (ns)
No. of Logic
Elements Required Delay (ns)
Array
8 112
9.673ns (0.688ns logic, 8.985ns route)
(7.1% logic, 92.9% route) 156 37.698
32 1555 38.223ns
(2.666ns logic, 35.557ns route) (7.0% logic, 93.0% route)
2957 146.943
Vedic
8 120
9.112ns (0.645ns logic, 8.467ns route)
(7.1% logic, 92.9% route) 176 35.236
32 2241
35.350 (2.537ns logic, 32.813ns route)
(7.2% logic, 92.8% route)
3218 130.425
No. of
Bits
Xilinx Spartan 6 XC6SLX45-2CSG324 Altera Cyclone II EP2C70F896I8
No. of
Slice
LUTs
Required
Delay (ns) No. of Logic
Elements Required
Delay (ns)
4 24
4.342ns (0.301ns logic, 4.041ns route)
(6.9% logic, 93.1% route) 32 19.338
8 120
9.112ns (0.645ns logic, 8.467ns route)
(7.1% logic, 92.9% route) 176 35.236
16 533
18.034ns (1.290ns logic, 16.744ns
route) (7.2% logic, 92.8% route)
758 69.328
32 2241
35.350 (2.537ns logic, 32.813ns
route) (7.2% logic, 92.8% route)
3218 130.425
International Journal of Pure and Applied Mathematics Special Issue
4612
7
The basic steps involved in performing linear convolution are as follows [9]:
Step 1: Consider two length of sequences l and m for x(n) and h(n) respectively. Step 2: The total length of the output sequence is computed asn=l+m-1. Step 3:The output sequence y(n) is given by ∞
y(n)=∑ x(k) h(n-k) (2) K=-∞
Step 4: Multiply the two sequences x(k) and h(n-k) element by element and sum up the
products to get y(n). Step 5:Increment the index n, shift the sequence h(n-k) to right by one sample and perform step4. Step 6: Repeat step 5 until the sum of products is zero for all the remaining values of n.
Figure3. General Diagram for n-point Linear Convolution
y(n)={a0b0,a1b0+a0b1,...+anbn} (3)
Let’s consider an example of fourpoint linear convolutionwith each point has 4-bit as shown
in figure4.where x(n)={1,2,3,4}and h(n)={1,2,3,4} Length of x(n)=4
Length of h(n)=4 Length of y(n)=n=4+4-1=7
International Journal of Pure and Applied Mathematics Special Issue
4613
8
Figure4. Linear Convolution of four-point Sequences
y(n)={1,4,10,20,25,24,16}(4)
The efficient Vedic multiplier (VM)is used as one of the module to calculate the linear
convolution of the two given sequences. The architecture of the 4 point two bit linear convolution is explained in Figure5 is as follows:
International Journal of Pure and Applied Mathematics Special Issue
4614
9
Figure5. Block Diagram For Four Point 4-bit Linear Convolution
We have calculated linear convolution for 4, 8, and 16 bits in four and eight points also and obtained successful results.
V.RESULTS AND DISCUSSION
A.SIMULATION RESULTS The simulation of linear convolution for various points (four, eight, and ten) is performed
in ISim simulator tool in the Xilinx ISE 14.2 design environment. The input for a four point 16-bit data sequence is a= {20,20,20,20} and b={20,20,20,20} and the convolved output is
y={400,800,1200,1600,1200,800,400} as depicted in Figure6.
The sequence a={15,15,15,15,15,15,15,15} and b={15,15,15,15,15,15,15,15} is given as input for an eight point linear convolution and the convolved output is
y={225,450,675,900,1125,1350,1575,1800,1575,1350,1125,900,675,450,225} is visualized in Figure7.
International Journal of Pure and Applied Mathematics Special Issue
4615
10
(i) 4-point 16 bit sequence
Figure6. Simulation Results for 4 Point 16-bit Sequence
(ii) 8-point 16 bit sequence
Figure7.Simulation Results for 8 Point 16-bit Sequence
B. SYNTHESIZED RESULTS The HDL code for four and eight point linear convolution is synthesizedusing Xilinx synthesis technology with reference to the Device Spartan 6 XC6slx45-2csg324 [11]and
also synthesized using Quartus II 12.0 sp2synthesis technology with reference to the DeviceAltera Cyclone IIEP2C70F896I8.
The following parameters are observed from the synthesized results are shown in the forth
coming tables.
International Journal of Pure and Applied Mathematics Special Issue
4616
11
TABLE 3 ANALYSIS OF FOUR POINT SEQUENCE
Table3.gives the detailed analysis of four point linear convolution for various input sequences like 4 bit, 8 bit and16 bit respectively. When synthesized using the Device Xilinx
Spartan 6 XC6SLX45-2CSG324, the notable difference in four point linear convolution is the delay gets doubled when the input bit sequence is multiplied by 2. When synthesized
using the Device Altera Cyclone IIEP2C70F896I8, the number of logic elements gets quadrupled when the input bit sequence is multiplied by 2.
TABLE 4 ANALYSIS OF EIGHT POINT SEQUENCE
No. of Bits
Xilinx Spartan 6 XC6SLX45-2CSG324 Altera Cyclone II EP2C70F896I8
No. of
Slice LUTs
Required
Delay (ns) No. of Logic Elements
Required
Delay (ns)
4
2031
7.145ns (0.516ns logic, 6.629ns route) (7.2% logic, 92.8% route)
2924 32.273
8
8706
13.096 (0.946ns logic, 12.150ns route) (7.2% logic, 92.8% route)
12740 57.891
16
36128
26.245ns (1.978ns logic, 24.267ns route) (7.5% logic, 92.5% route)
52454 99.212
Table4gives the analysis of eight point linear convolution in 4 bit, 8 bit and 16bit respectively. It shows a little amount of increase in delay when compared to four point
sequence because of increase in number of input points.When synthesized using the Device Altera Cyclone IIEP2C70F896I8, thenumber of logic elements increases approximatelyfour times when compared to four point sequence.
Figure8 and 9givesArea and Delaycomparisons for 4 point and 8point linear convolution
sequences in Xilinx and Altera FPGA v and Figure 10 and 11 provides required number of full adders and half adders.
No. of Bits Xilinx Spartan 6 XC6SLX45-2CSG324 Altera Cyclone II EP2C70F896I8
No. of Slice
LUTs
Required
Delay (ns) No. of Logic Elements
Required
Delay (ns)
4
476
6.156 (0.430ns logic, 5.726ns route)
(7.0% logic, 93.0% route) 683 26.982
8
2106
12.174 (0.860ns logic, 11.314ns route)
(7.1% logic, 92.9% route) 3061 48.106
16
8853
23.081ns (1.634ns logic, 21.447ns route)
(7.1% logic, 92.9% route) 12829 87.601
International Journal of Pure and Applied Mathematics Special Issue
4617
12
Figure8.Graphical Representation of Area utilization of various input points for Linear convolution
0
5000
10000
15000
20000
25000
30000
35000
40000
0 5 10 15 20
Nu
mb
er o
f L
UT
'S
Number of Bits
Area Analysis in Xilinx FPGA
4 Point
8 Point
0
10000
20000
30000
40000
50000
60000
0 5 10 15 20
Nu
mb
er o
f L
og
ic E
lem
en
ts
Number of Bits
Area Analysis in Altera FPGA
4 Point
8 Point
International Journal of Pure and Applied Mathematics Special Issue
4618
13
Figure9.Graphical Representation of Delay for various input points for Linear convolution
0
5
10
15
20
25
30
0 5 10 15 20
Dela
y (
ns)
Number of Bits
Delay Analysis in Xilinx FPGA
4 Point
8 Point
0
20
40
60
80
100
120
0 5 10 15 20
Dela
y (
ns)
Number of Bits
Delay Analysis in Altera FPGA
4 Point
8 Point
International Journal of Pure and Applied Mathematics Special Issue
4619
14
Figure10. Graphical Representation ofNumber of Half Adders
Figure11. Graphical Representation ofNumber of Full Adders
International Journal of Pure and Applied Mathematics Special Issue
4620
15
C. IMPLEMENTATION RESULTS
The Xilinx Tool facilitates the implementation of a logic circuit using ChipScope Pro Analyzer instead of external logic analyzer on the Spartan 6 board [20].It is having the
capability of capturing a 100 MHz clock signal frequency.ILA core, by taking advantage of integration flows between the Project Navigator and ChipScope Pro Core Inserter tools. VirtualInput/output (VIO) core is a customizable core that can both monitor and drive
internal FPGA signals in real time. The obtained waveform and snapshot of eight point linear convolution is shown in Figure12.
Figure12.FPGA Implementation and Verification Platform
International Journal of Pure and Applied Mathematics Special Issue
4621
16
VI.CONCLUSION
An analysis has been made between array and Vedic multiplier. Vedic multiplier is
superior in terms of speed when compared to array structure. Hence, the architecture of linear convolution based on Vedic multiplier is used to enhance the speed of the systemwas
analyzed for various points of different input bit widths. From our analysis, it shows that the hardware requirement for linear convolution is increased four times approximately when the input bit widths are doubled. In case of delay, the propagation delay is doubled when the
input bit widths are doubled. The system has been successfully implemented in hardware using Xilinx Spartan 6 XC6SLX45-2CSG324 Field-Programmable Gate Array (FPGA).
Finally, the output waveforms from the FPGA were displayed on Chip scope pro logic analyzer for real-time verification.
REFERENCES
[1] Steven W.Smith “The scientist and Engineer's Guide to Digital Signal Processing”
California Technical Publishing, 1999. [2] RashmiLomte and Bhaskar P.C., “High Speed Convolution and Deconvolution using
Urdhva Triyagbhyam”,2011 IEEE Computer Society Annual Symposium on VLSI, p.323, July 2011.
[3] John W. Pierre, “A Novel Method for Calculating the Convolution Sum of Two Finite
Length Sequences”, IEEE transaction oneducation, VOL.39, NO. 1, 1996.
[4] Jagadguru Swami Sri BharatiKrsnaTirthji Maharaja, “Vedic Mathematics”, MotilalBanarsidas, Varanasi, India, 1986.
[5] C.Ganesh Kumar, V.Chanishma, “Design of High speed Vedic multiplication using
Vedic Mathematics Techniques”, International Journal of Scientific and Research Publications, Volume 2, Issue 3,2012 PP[1-5].
[6] Ronak Bajaj., SaranshChhabra and M B Srinivas, “A Novel, Low-Power Array Multiplier Architecture”, ISCIT, 2009, PP [119-123].
[7] S.ShamimAkhter., “VHDL implementation of fast NXN multiplier based on Vedic
mathematics”, ECCTD, 2007, PP [472 – 475].
International Journal of Pure and Applied Mathematics Special Issue
4622
17
[8] Thapliyal H. and Srinivas M.B. “High Speed Efficient N x N Bit Parallel Hierarchical
Overlay Multiplier architecture Based onAncient Indian Vedic Mathematics”, Transactions on Engineering,Computing and Technology, 2004, Vol.2.
[9] J.G Proakis and D.G Monalkis, Digital Signal Processing. Macmillan ,1988.
[10] B.Parhami, “Computer Arithmetic Algorithms and Hardware Designs”, Oxford University, press 2000.
[11] https://www.xilinx.com/support.html#documentation
[12] AsmitaHaveliya,“FPGA Implementation of a Vedic Convolution Algorithm”,
International Journal of Engineering Research and Applications(IJERA), Volume 2, Issue 1,2012 PP [678-684].
[13] JubinHazraetal, “An Efficient Hardware Implementation of convolution architecture
using Vedic Mathematics”, International Journal of computer engineering and computer applications, Volume 09, Issue 1,2011 PP[14-25].
[14] Rashmi Rahul” Cull Carni, “Parallel Hardware Implementation of Convolution Using Vedic Mathematics, IOSR Journal of VLSI and signal processing(IOSR JVSP), Volume 1,
Issue 4,2012 PP[21-26].
[15] C.S. Wallace, “A Suggestion for a Fast Multiplier,” IEEE Trans. ElectronicComputers,(1964),Vol. 13, no. 1, pp. 14-17.
[16] L. Dadda, “Some Schemes for Parallel Multipliers”, AltaFrequenza, (1965), Vol.34,
pp. 349-356.
[17] KeshabK. Parhi, “VLSI Digital Signal Processing Systems-Design and Implementation”, Wiley-India, 2007.
[18] H. Guilt, “Fully Iterative Fast Array for Binary Multiplication,” Electronics
Letters,(1969), Vol. 5, p. 263. [19] R.Baugh and B.A. Wooley, “A Two’s Complement Parallel Array Multiplication
Algorithm,” IEEE Trans. Computers,(1973), Vol.22, no. 12, pp. 1045-1059.
[20]https://www.xilinx.com/products/design-tools/chipscopepro.html
International Journal of Pure and Applied Mathematics Special Issue
4623
4624