INVESTIGATION AND VLSI IMPLEMENTATION OF LINEAR CONVOLUTION … · 2018-07-15 · The most widely used types of convolution are linear and circular. Linear convolution is preferred

1

ABSTRACT Convolution is a mathematical operation in signal processing applications which is used to predict the response of the system for a givenimpulse response. Focus on this area is

stressed as it has various applications on fields like Digital Signal Processing (DSP), Digital Image Processing,Linear Acoustics, andStatistics.A high speed DSP system is therefore

required to perform the computational process of convolution in an effective manner. In this paper a detailed analysis and implementation has been carried out for linear convolution in

which the architecture of Vedic multiplication is used to enhance the computational speed of convolution operation.The architecture was simulated using ISim andsynthesized using Xilinx synthesis technology. The functional block has been successfully implemented in

hardware using Xilinx Spartan 6 XC6SLX45-2CSG324 Field-Programmable Gate Array (FPGA). Finally, the output waveforms from the FPGA were displayed on Chip scope VIO

console logic analysis for real-time verification.

Keywords: Linear Convolution,DSP,Vedic Multiplication, VHDL, Signal Processing, FFT.

INVESTIGATION AND VLSI IMPLEMENTATION OF LINEAR

CONVOLUTION ARCHITECTURE FOR FPGA BASED SIGNAL

PROCESSING APPLICATIONS

S.Elango1, P.Sampath

2, K. Shoukath Ali

3, Sajan P Philip

4,A.Daniel Raj

5

1Assistant Professor , Department of Electronics and Communication Engineering, Bannari Amman

Instituteof Technology, Sathyamangalam.Erode-638-401, Tamilnadu, India.

2Professor, Department of Electronics and Communication Engineering, Bannari Amman Instituteof

Technology, Sathyamangalam. Erode-638-401, Tamilnadu, India.


Instituteof Technology, Sathyamangalam. Erode-638-401, Tamilnadu, India.





[email protected],

[email protected],

International Journal of Pure and Applied MathematicsVolume 119 No. 16 2018, 4607-4624ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

4607

2

I.INTRODUCTION

A mathematical way of combining two signals to obtain a new signal is known as convolution. It is the single most important technique in DSP. Three signals play a vital role

in order to perform convolution i.e., the input signal (x[n]), the output signal (y[n]), and the impulse response (h[n]) [1].Itfinds its significance in the fields of Fourier Theory ,analysis

of linear systems and it is fundamental to many common image processing operators.Impulse function is mainly used for convolution in discrete cases. In linear time

invariant systems,a product of input and impulse signals known as convolution. The product of the two signalsstick to the commutative property of algebra [2],[3].It provides a way of

"multiplying together'' two arrays of numbers of different sizes with the same dimensionality, to produce a third array of numbers without the change in the

dimensionality.

Let’s consider two sequences m(x) and h(x), where m(x) is the input sequence,h(x) is the impulse response.The output response of the system g(x) is computed as follows:

g(x) = m(x) * h(x) (1)

Where * represents convolution.

The most widely used types of convolution are linear and circular. Linear convolution is preferred for finite length sequences of any length (for implementation only positive length

sequences are considered). FPGA design is preferred as it employs very fast inputs, outputs (IOs) and bidirectional data buses which are used to verify correct time of valid data.Linear

convolution solely depends upon the process of multiplication which is performed using multiplier.

The rest of this paper is structured as follows: Section II provides literature survey of multipliers and linear convolution. Section III describes architecture of Vedic multiplier.

Section IV describes high speed architecture of linear convolution. Simulation, Synthesized and Implementation resultswill be discussed in Section V. Conclusions are drawn in Section

VI.

II. LITERATURE SURVEY

The basic form of binary multiplier is an Array multiplier it [18] involves computing a set of partial products, and then summing the partial products together to get the final output.

Baugh and wooley reported an array architecture based signed multiplier [19] using the concept of 2’s complement. The Partial Product (PP) generation of Wallace tree [15]

multiplier is similar to array multiplier, the difference in PP addition. An n bit multiplier have an n rows of PP, rows are grouped into 3*|n/3|, remaining n mod 3 rows are passed

into next stage. A further improvement has made in PP addition by Dadda [17] but it is less regular than Wallace multiplier as mentioned in [16]. Vedic Mathematics is an ancient technique that allows efficient implementation of arithmetic rules for high speed

International Journal of Pure and Applied Mathematics Special Issue

4608

3

computationswhich can be applied to various branches of engineering[8]. There are 14

Vedic sutras available for mathematical calculation among that only two sutras supports for multiplication namely UrdhvaTiryakbhyam (UT) Sutra and Nikhilam Sutra for unsigned

and signed input format respectively [4]. The following literature survey represents a linear convolution based on Vedic sutras. The reason for choosing Vedic multiplier is illustrated in

Table I. JubinHazra et al.,[3] have discussed about the convolution calculation of two finite length sequence. In Reference [13] they have used Nikhilam Sutra based convolution the work was concentrated on reducing leakage current by using multiple channel CMOS

technique. In [12] AsmitaHaveliyaet alis the author who have implemented convolution based on Vedic Mathematics. In [14], Rashmi Rahul Cull Carni have provide the Parallel

architecture for Overlap-Add method (OLA) and Overlap-Save method (OLS)in order to find convolution for long sequencebased on Vedic multiplier.The authors are concentrated

on hardware implementation but failed to do detailed analysis for increased number of bits. So in this paper focused on a detailed analysis for four and eight points with different bit

widths purposely for FPGA based DSP applications.Since only few researches have been reported in the area of linear convolution, it is implemented in this paper using Vedic

multiplication.

III. VEDIC MULTIPLICATION A. UrdhvaTiryakbhyam(UT) Sutra

UT sutra is a multiplication method which is applicable to all cases of multiplication. The other name for UT sutra is “Vertically and Crosswise” [7].The Multiplier based on this

sutra has the advantage that as the number of bits increases, gate delay and area increases very slowly as compared to other conventional multipliers.

B. 8-BIT VEDIC MULTIPLIER

The 4x4 Vedic multiplier is the basic building block for design of an 8x8 Vedic multiplier as shown in Figure1 [8]. Let’s consider a pair of 8-bit numbers as a=a7a6a5a4a3a2a1a0 and b=

b7b6b5b4b3b2b1b0. The resultant 16-bitproduct is S (15 down to 0),is obtained after performing partial product addition using Ripple carry Adders. [10]


4609

4

Figure1.Block Diagram of 8-bit Vedic Multiplier

C. 16-BIT VEDIC MULTIPLIER

The 8x8 Vedic multiplier is the basic building block for design of a 16x16 Vedic multiplier as shown in Figure2.Let’s consider a pair of 16-bit numbers as

a=a15a14a13a12a11a10a9a8a7a6a5a4a3a2a1a0and b=b15b14b13b12b11b10b9b8b7b6b5b4b3b2b1b0.The resultantproduct is 32-bit– S(31 down to 0), is obtained after performing after performing partial product addition using Ripple carry Adders. [10]


4610

5

Figure2. Block Diagram of 16-bit Vedic Multiplier

IV. HIGH SPEED LINEAR CONVOLUTION ARCHITECTURE The multiplier which meets the requirement ofhigh speed multiplication is Vedic

multiplier(UrdhvaTiryakbhyam Sutra) [4],[5] when compared with other conventional multiplier [6] as depicted in Table 1.Figure3 shows the linear convolution of two n-point

sequences of length n1 and n2 namely x(n)= {a0,a1,….,an}and h(n) = {b0,b1,….,bn}.The output sequence is y(n)={y0,y1,…yn}.The length of the output sequence of an n-bit is

l=n1+n2-1.


4611

6

TABLE1. COMPARISON OF ARRAY MULTIPLIER WITH VEDIC MULTIPLIER

TABLE2. VEDIC MULTIPLIER

Multiplier Bits

Xilinx Spartan 6 XC6SLX45-2CSG324 Altera Cyclone II EP2C70F896I8

No. of Slice

LUTs Required Delay (ns)

No. of Logic

Elements Required Delay (ns)

Array

8 112

9.673ns (0.688ns logic, 8.985ns route)

(7.1% logic, 92.9% route) 156 37.698

32 1555 38.223ns

(2.666ns logic, 35.557ns route) (7.0% logic, 93.0% route)

2957 146.943

Vedic

8 120


(7.1% logic, 92.9% route) 176 35.236

32 2241

35.350 (2.537ns logic, 32.813ns route)

(7.2% logic, 92.8% route)

3218 130.425

No. of

Bits


No. of

Slice

LUTs

Required

Delay (ns) No. of Logic

Elements Required

Delay (ns)

4 24


(6.9% logic, 93.1% route) 32 19.338

8 120


(7.1% logic, 92.9% route) 176 35.236

16 533

18.034ns (1.290ns logic, 16.744ns

route) (7.2% logic, 92.8% route)

758 69.328

32 2241

35.350 (2.537ns logic, 32.813ns

route) (7.2% logic, 92.8% route)

3218 130.425


4612

7

The basic steps involved in performing linear convolution are as follows [9]:

Step 1: Consider two length of sequences l and m for x(n) and h(n) respectively. Step 2: The total length of the output sequence is computed asn=l+m-1. Step 3:The output sequence y(n) is given by ∞

y(n)=∑ x(k) h(n-k) (2) K=-∞

Step 4: Multiply the two sequences x(k) and h(n-k) element by element and sum up the

products to get y(n). Step 5:Increment the index n, shift the sequence h(n-k) to right by one sample and perform step4. Step 6: Repeat step 5 until the sum of products is zero for all the remaining values of n.

Figure3. General Diagram for n-point Linear Convolution

y(n)={a0b0,a1b0+a0b1,...+anbn} (3)

Let’s consider an example of fourpoint linear convolutionwith each point has 4-bit as shown

in figure4.where x(n)={1,2,3,4}and h(n)={1,2,3,4} Length of x(n)=4

Length of h(n)=4 Length of y(n)=n=4+4-1=7


4613

8

Figure4. Linear Convolution of four-point Sequences

y(n)={1,4,10,20,25,24,16}(4)

The efficient Vedic multiplier (VM)is used as one of the module to calculate the linear

convolution of the two given sequences. The architecture of the 4 point two bit linear convolution is explained in Figure5 is as follows:


4614

9

Figure5. Block Diagram For Four Point 4-bit Linear Convolution

We have calculated linear convolution for 4, 8, and 16 bits in four and eight points also and obtained successful results.

V.RESULTS AND DISCUSSION

A.SIMULATION RESULTS The simulation of linear convolution for various points (four, eight, and ten) is performed

in ISim simulator tool in the Xilinx ISE 14.2 design environment. The input for a four point 16-bit data sequence is a= {20,20,20,20} and b={20,20,20,20} and the convolved output is

y={400,800,1200,1600,1200,800,400} as depicted in Figure6.

The sequence a={15,15,15,15,15,15,15,15} and b={15,15,15,15,15,15,15,15} is given as input for an eight point linear convolution and the convolved output is

y={225,450,675,900,1125,1350,1575,1800,1575,1350,1125,900,675,450,225} is visualized in Figure7.


4615

10

(i) 4-point 16 bit sequence

Figure6. Simulation Results for 4 Point 16-bit Sequence

(ii) 8-point 16 bit sequence

Figure7.Simulation Results for 8 Point 16-bit Sequence

B. SYNTHESIZED RESULTS The HDL code for four and eight point linear convolution is synthesizedusing Xilinx synthesis technology with reference to the Device Spartan 6 XC6slx45-2csg324 [11]and

also synthesized using Quartus II 12.0 sp2synthesis technology with reference to the DeviceAltera Cyclone IIEP2C70F896I8.

The following parameters are observed from the synthesized results are shown in the forth

coming tables.


4616

11

TABLE 3 ANALYSIS OF FOUR POINT SEQUENCE

Table3.gives the detailed analysis of four point linear convolution for various input sequences like 4 bit, 8 bit and16 bit respectively. When synthesized using the Device Xilinx

Spartan 6 XC6SLX45-2CSG324, the notable difference in four point linear convolution is the delay gets doubled when the input bit sequence is multiplied by 2. When synthesized

using the Device Altera Cyclone IIEP2C70F896I8, the number of logic elements gets quadrupled when the input bit sequence is multiplied by 2.

TABLE 4 ANALYSIS OF EIGHT POINT SEQUENCE

No. of Bits


No. of

Slice LUTs

Required

Delay (ns) No. of Logic Elements

Required

Delay (ns)

4

2031

7.145ns (0.516ns logic, 6.629ns route) (7.2% logic, 92.8% route)

2924 32.273

8

8706

13.096 (0.946ns logic, 12.150ns route) (7.2% logic, 92.8% route)

12740 57.891

16

36128

26.245ns (1.978ns logic, 24.267ns route) (7.5% logic, 92.5% route)

52454 99.212

Table4gives the analysis of eight point linear convolution in 4 bit, 8 bit and 16bit respectively. It shows a little amount of increase in delay when compared to four point

sequence because of increase in number of input points.When synthesized using the Device Altera Cyclone IIEP2C70F896I8, thenumber of logic elements increases approximatelyfour times when compared to four point sequence.

Figure8 and 9givesArea and Delaycomparisons for 4 point and 8point linear convolution

sequences in Xilinx and Altera FPGA v and Figure 10 and 11 provides required number of full adders and half adders.

No. of Bits Xilinx Spartan 6 XC6SLX45-2CSG324 Altera Cyclone II EP2C70F896I8

No. of Slice

LUTs

Required

Delay (ns) No. of Logic Elements

Required

Delay (ns)

4

476

6.156 (0.430ns logic, 5.726ns route)

(7.0% logic, 93.0% route) 683 26.982

8

2106

12.174 (0.860ns logic, 11.314ns route)

(7.1% logic, 92.9% route) 3061 48.106

16

8853


(7.1% logic, 92.9% route) 12829 87.601


4617

12

Figure8.Graphical Representation of Area utilization of various input points for Linear convolution

0

5000

10000

15000

20000

25000

30000

35000

40000

0 5 10 15 20

Nu

mb

er o

f L

UT

'S

Number of Bits

Area Analysis in Xilinx FPGA

4 Point

8 Point

0

10000

20000

30000

40000

50000

60000

0 5 10 15 20

Nu

mb

er o

f L

og

ic E

lem

en

ts

Number of Bits

Area Analysis in Altera FPGA

4 Point

8 Point


4618

13

Figure9.Graphical Representation of Delay for various input points for Linear convolution

0

5

10

15

20

25

30

0 5 10 15 20

Dela

y (

ns)

Number of Bits

Delay Analysis in Xilinx FPGA

4 Point

8 Point

0

20

40

60

80

100

120

0 5 10 15 20

Dela

y (

ns)

Number of Bits

Delay Analysis in Altera FPGA

4 Point

8 Point


4619

14

Figure10. Graphical Representation ofNumber of Half Adders

Figure11. Graphical Representation ofNumber of Full Adders


4620

15

C. IMPLEMENTATION RESULTS

The Xilinx Tool facilitates the implementation of a logic circuit using ChipScope Pro Analyzer instead of external logic analyzer on the Spartan 6 board [20].It is having the

capability of capturing a 100 MHz clock signal frequency.ILA core, by taking advantage of integration flows between the Project Navigator and ChipScope Pro Core Inserter tools. VirtualInput/output (VIO) core is a customizable core that can both monitor and drive

internal FPGA signals in real time. The obtained waveform and snapshot of eight point linear convolution is shown in Figure12.

Figure12.FPGA Implementation and Verification Platform


4621

16

VI.CONCLUSION

An analysis has been made between array and Vedic multiplier. Vedic multiplier is

superior in terms of speed when compared to array structure. Hence, the architecture of linear convolution based on Vedic multiplier is used to enhance the speed of the systemwas

analyzed for various points of different input bit widths. From our analysis, it shows that the hardware requirement for linear convolution is increased four times approximately when the input bit widths are doubled. In case of delay, the propagation delay is doubled when the

input bit widths are doubled. The system has been successfully implemented in hardware using Xilinx Spartan 6 XC6SLX45-2CSG324 Field-Programmable Gate Array (FPGA).

Finally, the output waveforms from the FPGA were displayed on Chip scope pro logic analyzer for real-time verification.

REFERENCES

[1] Steven W.Smith “The scientist and Engineer's Guide to Digital Signal Processing”

California Technical Publishing, 1999. [2] RashmiLomte and Bhaskar P.C., “High Speed Convolution and Deconvolution using

Urdhva Triyagbhyam”,2011 IEEE Computer Society Annual Symposium on VLSI, p.323, July 2011.

[3] John W. Pierre, “A Novel Method for Calculating the Convolution Sum of Two Finite

Length Sequences”, IEEE transaction oneducation, VOL.39, NO. 1, 1996.

[4] Jagadguru Swami Sri BharatiKrsnaTirthji Maharaja, “Vedic Mathematics”, MotilalBanarsidas, Varanasi, India, 1986.

[5] C.Ganesh Kumar, V.Chanishma, “Design of High speed Vedic multiplication using

Vedic Mathematics Techniques”, International Journal of Scientific and Research Publications, Volume 2, Issue 3,2012 PP[1-5].

[6] Ronak Bajaj., SaranshChhabra and M B Srinivas, “A Novel, Low-Power Array Multiplier Architecture”, ISCIT, 2009, PP [119-123].

[7] S.ShamimAkhter., “VHDL implementation of fast NXN multiplier based on Vedic

mathematics”, ECCTD, 2007, PP [472 – 475].


4622

17

[8] Thapliyal H. and Srinivas M.B. “High Speed Efficient N x N Bit Parallel Hierarchical

Overlay Multiplier architecture Based onAncient Indian Vedic Mathematics”, Transactions on Engineering,Computing and Technology, 2004, Vol.2.

[9] J.G Proakis and D.G Monalkis, Digital Signal Processing. Macmillan ,1988.

[10] B.Parhami, “Computer Arithmetic Algorithms and Hardware Designs”, Oxford University, press 2000.

[11] https://www.xilinx.com/support.html#documentation

[12] AsmitaHaveliya,“FPGA Implementation of a Vedic Convolution Algorithm”,

International Journal of Engineering Research and Applications(IJERA), Volume 2, Issue 1,2012 PP [678-684].

[13] JubinHazraetal, “An Efficient Hardware Implementation of convolution architecture

using Vedic Mathematics”, International Journal of computer engineering and computer applications, Volume 09, Issue 1,2011 PP[14-25].

[14] Rashmi Rahul” Cull Carni, “Parallel Hardware Implementation of Convolution Using Vedic Mathematics, IOSR Journal of VLSI and signal processing(IOSR JVSP), Volume 1,

Issue 4,2012 PP[21-26].

[15] C.S. Wallace, “A Suggestion for a Fast Multiplier,” IEEE Trans. ElectronicComputers,(1964),Vol. 13, no. 1, pp. 14-17.

[16] L. Dadda, “Some Schemes for Parallel Multipliers”, AltaFrequenza, (1965), Vol.34,

pp. 349-356.

[17] KeshabK. Parhi, “VLSI Digital Signal Processing Systems-Design and Implementation”, Wiley-India, 2007.

[18] H. Guilt, “Fully Iterative Fast Array for Binary Multiplication,” Electronics

Letters,(1969), Vol. 5, p. 263. [19] R.Baugh and B.A. Wooley, “A Two’s Complement Parallel Array Multiplication

Algorithm,” IEEE Trans. Computers,(1973), Vol.22, no. 12, pp. 1045-1059.

[20]https://www.xilinx.com/products/design-tools/chipscopepro.html


4623

4624

Documents

INVESTIGATION AND VLSI IMPLEMENTATION OF LINEAR CONVOLUTION … · 2018-07-15 · The most widely used types of convolution are linear and circular. Linear convolution is preferred