AN EFFICIENT ARCHITECTURE FOR 3-D LIFTING …ijcta.com/documents/volumes/vol2issue5/ijcta2011020541.pdfAN EFFICIENT ARCHITECTURE FOR 3-D LIFTING-BASED DISCRETE WAVELET ... VLSI Architecture

ABSTRACT

This paper proposes an improved version of lifting based 3D Discrete Wavelet Transform (DWT) VLSI architecture which uses bi-orthogonal 9/7 filter processing. The whole architecture was optimized in efficient pipeline and parallel design way to speed up and achieve higher hardware utilization. The Discrete Wavelet Transform (DWT) was based on time-scale representation, which provides efficient multi-resolution. The lifting based DWT architecture has the advantage of lower computational complexities transforming signals with extension and regular data flow. This is suitable for VLSI implementation. It uses a cascade combination of three 1-D wavelet transform along with a set of in-chip memory buffers between the stages. The discrete wavelet transform (DWT) is being increasingly used for image coding. This is due to the fact that DWT supports features like progressive image transmission (by quality, by resolution), ease of compressed image manipulation, region of interest coding, etc. DWT has traditionally been implemented by convolution. Such an implementation demands both a large number of computations and a large storage features that are not desirable for either high-speed or low-power applications. Recently, a lifting-based scheme that often requires far fewer computations has been proposed for the DWT. The main feature of the lifting based DWT scheme is to break up the high pass and low pass filters into a sequence of upper and lower triangular matrices and convert the filter implementation into banded matrix multiplications. Such a scheme has several advantages, including “in-place” computation of the DWT, integer-to-integer wavelet transform (IWT), symmetric forward and inverse transform, etc. Therefore, it comes as no surprise that lifting has been chosen in the upcoming.

Keywords— Descrete wavelet transform, image compression, lifting, video, VLSI architecture.

I. INTRODUCTION

The fundamental idea behind wavelets is to analyze according to scale. Indeed, some researchers in the wavelet field feel that, by using wavelets, one is adopting a perspective in processing data. Wavelets are functions that satisfy certain mathematical requirements and are used in representing data or other functions. Wavelet algorithms process data at different scales or resolutions. Fourier Transform (FT) with its fast algorithms (FFT) is an important tool for analysis and processing of many natural signals. FT has certain limitations to characterize many natural signals, which are non-stationary (e.g. speech). Though a time varying, overlapping window based FT namely STFT (Short Time FT) is well known for speech processing applications, a time-scale based Wavelet Transform is a powerful mathematical tool for non-stationary signals.

1.1 Introduction to Wavelet Transform:

The wavelet transform is computed separately for different segments of the time-domain signal at different frequencies. Multi-resolution analysis: analyzes the signal at different frequencies giving different resolutions. Multi-resolution analysis is designed to give good time resolution and poor frequency resolution at high frequencies and good frequency resolution and poor time resolution at low frequencies. Good for signal having high frequency components for short durations and low frequency components for long duration, e.g. Images and video frames.

1.1.1 Wavelet Definition A ‘wavelet’ is a small wave which has its energy concentrated in time. It has an oscillating wavelike characteristic but also has the ability to allow simultaneous time and frequency analysis and it is a suitable tool for transient, non-stationary or time-varying phenomena.

AN EFFICIENT ARCHITECTURE FOR 3-D

LIFTING-BASED DISCRETE WAVELET TRANSFORM

M. JANARDAN, M.TECH (DECS)

Dr. K ASHOK BABU Professor & HOD of (ECE) SRI INDU COLLEGE OF ENGG&TECHNOLOGY

(Affiliated to JNTU, Hyderabad) Ibrahimpatnam Hyderabad, Andhra Pradesh, India -501510

M Janardan et al, Int. J. Comp. Tech. Appl., Vol 2 (5), 1439-1458

IJCTA | SEPT-OCT 2011 Available [email protected]

1439

ISSN:2229-6093

(a) (b)

Figure1.1 Representation of a (a) wave (b) wavelet

1.1.2 Wavelet Characteristics The difference between wave (sinusoids) and wavelet is shown in figure 1.1. Waves are smooth, predictable and everlasting, whereas wavelets are of limited duration, irregular and may be asymmetric. Waves are used as deterministic basis functions in Fourier analysis for the expansion of functions (signals), which are time-invariant, or stationary. The important characteristic of wavelets is that they can serve as deterministic or non-deterministic basis for generation and analysis of the most natural signals to provide better time-frequency representation, which is not possible with waves using conventional Fourier analysis.

1.1.3 Wavelet Analysis The wavelet analysis procedure is to adopt a wavelet prototype function, called an ‘analyzing wavelet’ or ‘mother wavelet’. Temporal analysis is performed with a contracted, high frequency version of the prototype wavelet, while frequency analysis is performed with a dilated, low frequency version of the same wavelet. Mathematical formulation of signal expansion using wavelets gives Wavelet Transform (WT) pair, which is analogous to the Fourier Transform (FT) pair. Discrete-time and discrete-parameter version of WT is termed as Discrete Wavelet Transform (DWT).

1.2 Types of Transforms : 1.2.1 Fourier Transform (FT) Fourier transform is a well-known mathematical tool to transform time-domain signal to frequency-domain for efficient extraction of information and it is reversible also. For a signal x(t), the FT is given by

Though FT has a great ability to capture signal’s frequency content as long as x(t) is composed of few stationary

components (e.g. sine waves). However, any abrupt change in time for non-stationary signal x(t) is spread out over the whole frequency axis in X(f). Hence the time-domain signal sampled with Dirac-delta function is highly localized in time but spills over entire frequency band and vice versa. The limitation of FT is that it cannot offer both time and frequency localization of a signal at the same time. 1.2.2 Short Time Fourier Transform (STFT) To overcome the limitations of the standard FT, Gabor introduced the initial concept of Short Time Fourier Transform (STFT). The advantage of STFT is that it uses an arbitrary but fixed-length window g(t) for analysis, over which the actual non-stationary signal is assumed to be approximately stationary. The STFT decomposes such a pseudo-stationary signal x(t) into a two dimensional time-frequency representation S(τ , f) using that sliding window g(t) at different times τ . Thus the FT of windowed signal x(t) g*(t-τ) yields STFT as

1.2.3 Wavelet Transform (WT) Fixed resolution limitation of STFT can be resolved by letting the resolution in time-frequency plane in order to obtain Multi resolution analysis. The Wavelet Transform (WT) in its continuous (CWT) form provides a flexible time-frequency, which narrows when observing high frequency phenomena and widens when analyzing low frequency behavior. Thus time resolution becomes arbitrarily good at high frequencies, while the frequency resolution becomes arbitrarily good at low frequencies. This kind of analysis is suitable for signals composed of high frequency components with short duration and low frequency components with long duration, which is often the case in practical situations. 1.3 Comparative Visualisation : The time-frequency representation problem is illustrated in figure1.2. A comprehensive visualization of various time-frequency representations shown in figure 1.2, demonstrates the time-frequency resolution for a given signal in various transform domains with their corresponding basis functions.



1440

ISSN:2229-6093

Figure1.2 Comparative visualizations of time-frequency representation of an arbitrary non-stationary signal in various transform

domains

1.4 Difference between Continuous Wavelet Transform and Discrete Wavelet Transform :

Wavelet transforms are classified into discrete wavelet transforms (DWTs) and continuous wavelet transforms (CWTs). Note that both DWT and CWT are continuous-time (analog) transforms. They can be used to represent continuous-time (analog) signals. CWTs operate over every possible scale and translation whereas DWTs use a specific subset of scale and translation values or representation grid. The Wavelet transform is in fact an infinite set of various transforms, depending on the merit function used for its computation. This is the main reason, why we can hear the term "wavelet transform" in very different situations and applications.

• Orthogonal wavelets are used to develop the discrete wavelet transform

• Non-orthogonal wavelets are used to develop the continuous wavelet transform

1.5 Applications of Discrete Wavelet Transform :

Generally, an approximation to DWT is used for data compression if signal is already sampled, and the CWT for signal analysis. Thus, DWT approximation is commonly used in engineering and computer science, and the CWT in scientific research.

One use of wavelet approximation is in data compression. Like some other transforms, wavelet transforms can be used to transform data and then encode the transformed data, resulting in effective compression. For example, JPEG 2000 is an image compression standard that uses biorthogonal wavelets. A related use is that of smoothing/denoising data based on wavelet coefficient thresholding, also called wavelet shrinkage. By adaptively thresholding the wavelet coefficients that correspond to undesired frequency components smoothing and/or denoising operations can be performed. Other applied fields that are making use of wavelets include astronomy, acoustics, nuclear engineering, sub-band coding, signal and image processing, neurophysiology, music, magnetic resonance imaging, speech discrimination, optics, fractals, turbulence, earthquake-prediction, radar, human vision, and pure mathematics applications such as solving partial differential equations.

II. THEORITICAL FRAMEWORK VLSI Architecture to design the Discrete Wavelet Transform for medical images storage and retrieval is carried out. Lossless is usually required in the medical image field. The word length required for lossless makes too expensive Thus, there is a clear need for designing architecture to implement the lossless DWT for medical images. The data path word-length has been selected to ensure the lossless accuracy criteria leading a high speed implementation with small chip area. The DWT represents the signal in dynamic sub-band decomposition. Generation of the DWT in a wavelet packet allows sub-band analysis without the constraint of dynamic decomposition. The discrete wavelet packet transform (DWPT) performs an adaptive decomposition of frequency axis. The specific decomposition will be selected according to an optimization criterion



1441

ISSN:2229-6093

The Discrete Wavelet Transform (DWT), based on time-scale representation, provides efficient multi-resolution sub-band decomposition of signals. It has become a powerful tool for signal processing and finds numerous applications in various fields such as audio compression, pattern recognition, texture discrimination, computer graphics etc. Specifically the 2-D DWT play a significant role in many image/video coding applications.

2.1 Types of compressions:

There are two types of compressions 1. Lossless compression Digitally identical to the original image. Only achieve a modest amount of compression 2. Lossy compression

Discards components of the signal that are known to be redundant. Signal is therefore changed from input • Lossless compression involves with compressing data, when decompressed data will be an exact replica of the original data.

This is the case when binary data such as executable are compressed.

Figure 2.1 Different Types of Lossy Compression Techniques

2.2 Reviewed Architectures of Discrete wavelet transforms and inverse discrete wavelet transforms :

1) Discrete Wavelet Transform The Discrete Wavelet Transform (DWT) is a popular signal processing technique best known for its results in data

compression. As hardware designers, we are concerned more with the algorithmic details of the DWT, rather than the mathematical details discussed in the many papers which provide the foundations for wavelets. Algorithmically, the DWT is a recursive filtering process. At each “level”, the input data is filtered by two related filters to produce two result data-streams. These data-streams are then sub samples by two (or “decimated”) to reduce the output to the same number of data-words as the original signal. The low-pass filter output of this result is then further processed by the same two filters, and this continues recursively for the desired depth or until no further filtering can occur. This recursive filtering process of the one-dimensional DWT is shown in Figure 1, where z is the input data-stream, a and d are approximation (low-pass filter output) and difference (high-pass filter output) data-streams respectively. The subscript values show the “level” of output.

Hybrid Predictive Frequency oriented Importance

oriented

DCT DWT

Transform

Fractal

Mallat Transversal filter Coedic Lifting Scheme

LOSSY



1442

ISSN:2229-6093

Figure 2.2 The DWT filtering process.

The filtering steps are multiply and accumulate operations. A filter in the algorithmic, discrete sense is a number of “coefficient” values. The number of these values is referred to as the “filter width” and these coefficients are also referred to as “taps”. At each data-word of the input, the filter spans across that data-word and its neighboring data-words as a “window”. The values within this window are multiplied by their corresponding filter coefficient and all the results are added together to give the filtered result for this data-word. The filtering operation extracts certain frequency information from the data depending on the characteristics of the filter. This filtering operation can be done with a systolic array. It is simple to implement a systolic array for each level of the DWT, but the arrays are poorly utilized due to the decreasing data-rates of the levels. It is possible, through some complex timing, to use a single array to perform all levels of the DWT. 2) Inverse discrete wavelet transform

The inverse DWT (IDWT) is the computational reverse. The lowest low-pass and high pass data-streams are up-sampled (ie. a zero is placed between each data-word) and then filtered using filters related to the decomposition filters. The two resulting streams are simply added together to form the low-pass result of the previous level of processing. This can be combined with the high-pass result in a similar fashion to produce further levels, the process continuing until the original data-stream is reconstructed. This process is shown in figure2

Figure 2.3 The Inverse DWT filtering process.

We have previously developed an array for the DWT and are now designing an array for the IDWT. We present a simple discussion of the array with no detailed implementation specifics to allow the reader to understand the issues we are dealing with by the input buffering approach.

2.3 Discrete Wavelet Transform Architecture :

The discrete wavelet transform (DWT) is being increasingly used for image coding. This is due to the fact that DWT supports features like progressive image transmission (by quality, by resolution), ease of compressed image manipulation, region of interest coding, etc. DWT has traditionally been implemented by convolution. Such an implementation demands both a large number



1443

ISSN:2229-6093

of computations and a large storage features that are not desirable for either high-speed or low-power applications. Recently, a lifting-based scheme that often requires far fewer computations has been proposed for the DWT. The main feature of the lifting based DWT scheme is to break up the high pass and low pass filters into a sequence of upper and lower triangular matrices and convert the filter implementation into banded matrix multiplications. Such a scheme has several advantages, including “in-place” computation of the DWT, integer-to-integer wavelet transform (IWT), symmetric forward and inverse transform, etc. Therefore, it comes as no surprise that lifting has been chosen in the upcoming.

The proposed architecture computes multilevel DWT for both the forward and the inverse transforms one level at a time, in a row-column fashion. There are two row processors to compute along the rows and two column processors to compute along the columns. While this arrangement is suitable or filters that require two banded-matrix multiplications filters that require four banded-matrix multiplications require all four processors to compute along the rows or along the columns. The outputs generated by the row and column processors (that are used for further computations) are stored in memory modules.

The memory modules are divided into multiple banks to accommodate high computational bandwidth requirements. The proposed architecture is an extension of the architecture for the forward transform that was presented. A number of architectures have been proposed for calculation of the convolution-based DWT. The architectures are mostly folded and can be broadly classified into serial architectures (where the inputs are supplied to the filters in a serial manner) and parallel architectures (where the inputs are supplied to the filters in a parallel manner).

Figure 2.4 Lifting Schemes. (a) Scheme 1. (b) Scheme 2.

The basic principle of the lifting scheme is to factorize the poly phase matrix of a wavelet filter into a sequence of alternating upper and lower triangular matrices and a diagonal matrix. This leads to the wavelet implementation by means of banded-matrix multiplications.

Let and be the low pass and high pass analysis filters, and let and be the low pass and high pass synthesis filters. The corresponding poly-phase matrices are defined as

If is a complementary filter pair, then can always be factored into lifting steps as

Where K is a constant. The two types of lifting schemes are shown in Figure

Scheme 1 which corresponds to the factorization consists of three steps:

• Predict step, where the even samples are multiplied by the time domain equivalent of and are added to the odd

samples.

• Update step, where updated odd samples are multiplied by the time domain equivalent of and are added to the even

samples.

• Scaling step, where the even samples are multiplied by 1/k and odd samples by k.



1444

ISSN:2229-6093

The inverse DWT is obtained by traversing in the reverse direction, changing the factor K to 1/K, K factor to 1/K, and

reversing the signs of coefficients in and . In Scheme 2 which corresponds to the factorization, the odd samples are calculated in the first step, and the even samples are calculated in the second step. The inverse is obtained by traversing in the reverse direction.

The lifting scheme is a technique for both designing wavelets and performing the discrete wavelet transform. Actually it is worthwhile to merge these steps and design the wavelet filters while performing the wavelet transform. This is then called the second generation wavelet transform. The technique was introduced by Wim Sweldens.

The discrete wavelet transform applies several filters separately to the same signal. In contrast to that, for the lifting scheme the signal is divided like a zipper. Then a series of convolution-accumulate operations across the divided signals is applied.

The basic idea of lifting is the following: If a pair of filters (h,g) is complementary, that is it allows for perfect

reconstruction, then for every filter s the pair (h',g) with allows for perfect reconstruction,

too. Of course, this is also true for every pair (h,g') of the form . The converse is also true: If the filter banks (h,g) and (h',g) allow for perfect reconstruction, then there is a unique filter s with

. Each such transform of the filter bank (or the respective operation in a wavelet transform) is called a lifting step. A sequence

of lifting steps consists of alternating lifts, that is, once the low pass is fixed and the high pass is changed and in the next step the high pass is fixed and the low pass is changed. Successive steps of the same direction can be merged. [6]

2.4 Lifting Implementation of the Discrete Wavelet Transform : The DWT has been traditionally implemented by convolution or FIR filter bank structures. The DWT implementation is

basically a frame-based as opposed /to the block-based implementation of discrete cosine transforms (DCT) /or similar transformations. Such an implementation requires both a large number of arithmetic computations and a large memory for storage – features /that are not desirable for either high-speed or low-power image and /video processing applications. Recently, a new mathematical formulation for /wavelet transformation has been proposed by Swelden based on spatial construction of the wavelets and a very versatile scheme for its factorization /has been suggested in. This new approach is called the lifting-based /wavelet transform, or simply lifting. The main feature of the lifting-based DWT scheme is to break up the high-pass and low-pass wavelet filters into a sequence of smaller filters that in turn can be converted into a sequence of upper and lower triangular matrices, which will be discussed in the subsequent section.

This scheme often requires far fewer computations compared to the convolution-based DWT, and its computational complexity can be reduced up to 50%. It has several other advantages, including “in-place” computation of the DWT, integer-to-integer wavelet transform (IWT), symmetric forward and inverse transform, requiring no signal boundary extension, etc. As a result, lifting-based hardware implementations provide an efficient way to compute wavelet transforms compared to traditional approaches. So it comes as no surprise that lifting has been suggested for implementation of the DWT in the upcoming JPEG2000 standard [8]. In a traditional forward DWT using a filter bank, the input signal (x) is filtered separately by a low-pass filter ( h ) and a high-pass filter ( g ) at each/transform level. The two output streams are then sub sampled by simply dropping the alternate output samples in each stream to produce the lowpass ( y ~a)nd high-pass ( y ~su) b bands as shown in Figure 4.6. These two filters (k , i j ) form the analysis filter bank. The original signal can be reconstructed by a synthesis filter bank (h,g) starting from y~ and Y H as shown in Figure 4.6. We have adopted the discussion on lifting from the celebrated paper by Daubechies and Sweldens (141. It should also be noted that we adopted the notation (h, g ) for the analysis filter and (h, g ) as the synthes is filter in this section and onward in this chapter. Given a discrete signal x ( n ) , arithmetic computation of above can be expressed as follows:

Where TL arid TH are the lengths of the low-pass (K) and high-pass ( 3 ) filters respectively. During the inverse transform

to reconstruct the signal, both y~ and Y H are first up-sampled by inserting zeros between two samples and then they are filtered by low-pass (h) and high-pass (9) filters respectively. These two filtered output streams are added together to obtain the reconstructed signal (2') as shown in Figure 4.6.



1445

ISSN:2229-6093

Figure 2.5 Signal Analysis and Reconstruction in DWT There are two types of lifting. One is called primal lifting and the other is called dual lifting. We define these two types of lifting

based on the mathematical formulations shown in the previous section.

IIIIII.. PPRROOPPOOSSEEDD AARRCCHHIITTEECCTTUURREE

3.1 Discrete Wavelet Transforms : The discrete wavelet transform (DWT) became a very versatile signal processing tool after Mallat proposed the

multi-resolution representation of signals based on wavelet decomposition. The method of multi-resolution is to represent a function (signal) with a collection of coefficients, each of which provides information about the position as well as the frequency of the signal (function). The advantage of the DWT over Fourier transformation is that it performs multi-resolution analysis of signals with localization both in time and frequency, popularly known as time-frequency localization. As a result, the DWT decomposes a digital signal into different sub bands so that the lower frequency sub bands have finer frequency resolution and coarser time resolution compared to the higher frequency sub bands. The DWT is being increasingly used for image compression due to the fact that the DWT supports features like progressive image transmission (by quality, by resolution), ease of compressed image manipulation] region of interest coding, etc. Because of these characteristics, the DWT is the basis of the new JPEG2000 image compression standard.

3.2 One dimensional DWT : Any signal is first applied to a pair of low-pass and high-pass filters. Then down sampling (i.e., neglecting the alternate

coefficients) is applied to these filtered coefficients. The filter pair (h, g) which is used for decomposition is called analysis filter-bank and the filter pair which is used for reconstruction of the signal is called synthesis filter bank.(g`, h`).The output of the low pass filter after down sampling contains low frequency components of the signal which is approximate part of the original signal and the output of the high pass filter after down sampling contains the high frequency components which are called details (i.e., highly textured parts like edges) of the original signal.

This approximate part can still be further decomposed into low frequency and high frequency components. This process can be continued successively to the required number of levels. This process is called multi level decomposition, shown in Figure 3.1

Figure 3.1 One dimensional two level wavelet decomposition

In reconstruction process, these approximate and detail coefficients are first up-sampled and then applied to low-pass and high-pass reconstruction filters. These filtered coefficients are then added to get the reconstructed version of the original image. This process can be extended to multi level reconstruction i.e., the approximate coefficients to this block may have been formed from pairs of approximate and detail coefficients. Shown in Figure 3.2

Figure 3.2 One dimensional inverse wavelet transforms

3.3 Two-Dimensional DWT : One dimensional DWT can be easily extended to two dimensions which can be used for the transformation of two

dimensional images. A two dimensional digital image which can be represented by a 2-D array X [m,n] with m rows and n columns,



1446

ISSN:2229-6093

where m, n are positive integers. First, a one dimensional DWT is performed on rows to get low frequency L and high frequency H components of the image. Then, once again a one dimensional DWT is performed column wise on this intermediate result to form the final DWT coefficients LL, HL, LH, HH. These are called sub-bands.

The LL sub-band can be further decomposed into four sub-bands by following the above procedure. This process can continue to the required number of levels. This process is called multi level decomposition. A three level decomposition of the given digital image is as shown. High pass and low pass filters are used to decompose the image first row-wise and then column wise. Similarly, the inverse DWT is applied which is just opposite to the forward DWT to get back the reconstructed image, shown in Figure 3.3

Figure 3.3 Row-column computation of 2-D DWT

Figure 3.4 Two channel filter bank at level 3

Various architectures have been proposed for computation of the DWT. These can be mainly classified as either

Convolutional Architectures or Lifting Based Architectures. The number of computations required to find the DWT coefficients by the filter method is large for higher level of decomposition. This leads to the implementation of new technique called lifting scheme for computing DWT coefficients. This scheme reduces the number of computations and also provides in-place computation of DWT coefficients. 3.4 GENERAL IMPLEMENTATION FLOW : The generalized implementation flow diagram of the project is represented as follows.



1447

ISSN:2229-6093

http://en.wikipedia.org/wiki/Image:Wavelets_-_Filter_Bank.png�

Figure 3.5 General Implementation Flow Diagram

Initially the market research should be carried out which covers the previous version of the design and the current requirements on the design. Based on this survey, the specification and the architecture must be identified. Then the RTL modeling should be carried out in VERILOG HDL with respect to the identified architecture. Once the RTL modeling is done, it should be simulated and verified for all the cases. The functional verification should meet the intended architecture and should pass all the test cases. Once the functional verification is clear, the RTL model will be taken to the synthesis process. Three operations will be carried out in the synthesis process such as

Translate

Map

Place and Route

The developed RTL model will be translated to the mathematical equation format which will be in the understandable format of the tool. These translated equations will be then mapped to the library that is, mapped to the hardware. Once the mapping is done, the gates were placed and routed. Before these processes, the constraints can be given in order to optimize the design. Finally the BIT MAP file will be generated that has the design information in the binary format which will be dumped in the FPGA board.

3.5 Implementation The 2D (5, 3) wavelet transform block and for the recovery stage 2D (5, 3) Inverse wavelet transform were designed.

3.5.1 Integer Wavelet Transform

In conventional DWT realizations, partial transform results need to be represented with a high precision. This raises storage and complexity problems. On the other hand, the Integer Wavelet Transform (IWT) produces integer intermediate results. Thus, it is possible to use integer arithmetic without encountering rounding error problems. There are different types of integer transforms like S(sequential) transform which is popularly known as Haar wavelet transform, S(sequential)+P(prediction) transform, CDF(4,4), CDF(2,2) also known as (5,3) transform etc.

The two filter banks supported by JPEG2000 standard are Debauchies (9, 7) and Debauchies (5, 3) filter banks. Since the integer to integer wavelet transform coefficients are integers, it can be used in lossless compression. Since the aim of the thesis is to suggest a reversible (lossless) watermarking method so we will consider only (5, 3) Integer Wavelet Transform.

2-D (5, 3) DWT – Lossless Transformation

The analysis and the synthesis filter coefficients ( both low pass and high pass) for Le Gall 5/3 Integer Wavelet Transform are as shown in the table 3.1.



1448

ISSN:2229-6093

Table 3.1 Le Gall 5/3 Analysis and Synthesis Filter coefficients [17]. Analysis Filter Coefficients n Low Pass filter h(n) High pass filter g(n) 0 6/8 1 + 2/8 -1/2 + -1/8

Equation 3.1 and Equation 3.2 shows the lifting steps for the 5/3 le Gall Integer Wavelet Transform. The rational coefficients allow the transform to be invertible with finite precision analysis, hence giving a chance for performing lossless compression. The equations show the lifting steps for (5, 3) le gall Integer Wavelet Transform. The even and odd coefficient equations for (5, 3) Inverse Integer Wavelet Transform are

( ) ( ) ( ) ( )

++

−+=+2

2221212 nnxnxny …………………………….. (3.1)

( ) ( ) ( ) ( )[ ]121222 ++−+= nynynxny ………………………..… (3.2)

3.5.2 The 2-D (5, 3) Discrete Wavelet Transform Initially the Pixel values of any image will be taken with the help of MATLAB, which will be used as the primary inputs to

the DWT Block. Basically 1-D (5, 3) DWT block diagram is developed based on the equations (2) and (3). The registers in the top half will

operate in even clock where as the ones in bottom half work in odd clock. The input pixels arrive serially row-wise at one pixel per clock cycle and it will get split into even and odd. So after the

manipulation with the lifting coefficients ‘a’ and ‘b’ is done, the low pass and high pass coefficients will be given out. Hence for every pair of pixel values, one high pass and one low pass coefficients will be given as output respectively.

Figure 3.6 Computation of Basic (5, 3) DWT Block in which ‘a’ and ‘b’ are lifting coefficients (a = -1/2 and b = 1)

The internal operation of the DWT block has been explained above and hence the high pass and low pass coefficients of the

taken image were identified and separated. The generated low pass and high pass coefficients are stored in buffers for further calculations.

IV. IIMMPPLLEEMMEENNTTAATTIIOONN RREESSUULLTTSS AANNDD DDIISSCCUUSSSSIIOONNSS The DWT process and the developed architecture for the required functionality were discussed in the previous chapters.

Now this chapter deals with the simulation and synthesis results of the DWT process. Here Modelsim tool is used in order to simulate the design and checks the functionality of the design. Once the functional verification is done, the design will be taken to the Xilinx tool for Synthesis process and the netlist generation.



1449

ISSN:2229-6093

The Appropriate test cases have been identified in order to test this modelled DWT process architecture. Based on the identified values, the simulation results which describes the operation of the process has been achieved. This proves that the modelled design works properly as per its functionality.

4.1 Simulation Results

The test bench is developed in order to test the modeled design. This developed test bench will automatically force the inputs and will make the operations of algorithm to perform.

Behavioral Simulation (RTL Simulation):

This is first of all simulation steps; those are encountered throughout the hierarchy of the design flow. This simulation is performed before synthesis process to verify RTL (behavioral) code and to confirm that the design is functioning as intended. Behavioral simulation can be performed on either VHDL or Verilog designs. In this process, signals and variables are observed, procedures and functions are traced and breakpoints are set. This is a very fast simulation and so allows the designer to change the HDL code if the required functionality is not met with in a short time period. Since the design is not yet synthesized to gate level, timing and resource usage properties are still unknown.

4.1.1 DWT Block The initial block of the design is that the Discrete Wavelet Transform (DWT) block which is mainly used for the

transformation of the image. In this process, the image will be transformed and hence the high pass coefficients and the low pass coefficients were generated. Since the operation of this DWT block has been discussed in the previous chapter, here the snapshots of the simulation results were directly taken in to consideration and discussed.

The input is 16 bits each input bit width is 20 bit width. The DWT consists of registers and adders. When ever the input is send, the data divided into even data and odd data. The even data and odd data is stored in the temporary registers. When the reset is high the temporary register value consists of zero when ever the reset is low the input data split into the even data and odd data. The input data read up to sixteen clock cycles after that the data read according to the lifting scheme. The output data consists of low pass and high pass elements. This is the 1-D discrete wavelet transform. The 2-D discrete wavelet transform is that the low pass and the high pass again divided into LL, LH and HH, HL. The output is verified in the Modelsim.

For this DWT block, the clock and reset were the primary inputs. The pixel values of the image, that is, the input data will be given to this block and hence these values will be split in to even and odd pixel values. In the design, this even and odd were taken as a array which will store its pixel values in it and once all the input pixel values over, then load will be made high which represents that the system is ready for the further process. Once the load signal is set to high, then the each value from the even and odd array will be taken and used for the Low Pass Coefficients generation process. Hence each value will be given to the adder and in turn given to the multiplication process with the filter coefficients. Finally the Low Pass Coefficients will be achieved from the addition process of multiplied output and the odd pixel value.

Again this Low Pass Coefficient will be taken and it will be multiplied with the filter coefficients. The resultant will be added with the even pixel value which gives the High Pass Coefficient. Hence all the values from even and odd array will be taken and then above said process will be carried out in order to achieve the High and Low Pass Coefficients of the image.

Now these low pass coefficients and the high pass coefficients were taken as the input for the further process. Hence for the DWT-2 process, low pass coefficients will be taken as the inputs and will do the process in order to calculate the low pass and high pass coefficients from the transformed coefficients of DWT-1. In DWT-2, the same process as in DWT-1 will be carried out. Hence the simulated waveform is shown in the figure 4.2.



1450

ISSN:2229-6093

Figure 4.1 Simulation Result of DWT-1 Block with Both High and Low Pass Coefficients



Similarly the high pass coefficients from the DWT-1 block were taken as input to the DWT-3 block and hence further transformed low pass and high pass coefficients will be obtained.



1451

ISSN:2229-6093

4.3 Introduction to FPGA FPGA contains a two dimensional arrays of logic blocks and interconnections between logic blocks. Both the logic blocks

and interconnects are programmable. Logic blocks are programmed to implement a desired function and the interconnects are programmed using the switch boxes to connect the logic blocks.

To be more clear, if we want to implement a complex design (CPU for instance), then the design is divided into small sub functions and each sub function is implemented using one logic block. Now, to get our desired design (CPU), all the sub functions implemented in logic blocks must be connected and this is done by programming the Internal structure of an FPGA is depicted in the following figure.

FPGAs, alternative to the custom ICs, can be used to implement an entire System On one Chip (SOC). The main advantage

of FPGA is ability to reprogram. User can reprogram an FPGA to implement a design and this is done after the FPGA is manufactured. This brings the name “Field Programmable.”

Custom ICs are expensive and takes long time to design so they are useful when produced in bulk amounts. But FPGAs are easy to implement with in a short time with the help of Computer Aided Designing (CAD) tools (because there is no physical layout process, no mask making, and no IC manufacturing).

Some disadvantages of FPGAs are, they are slow compared to custom ICs as they can’t handle vary complex designs and also they draw more power.

Xilinx logic block consists of one Look Up Table (LUT) and one FlipFlop. An LUT is used to implement number of different functionality. The input lines to the logic block go into the LUT and enable it. The output of the LUT gives the result of the logic function that it implements and the output of logic block is registered or unregistered out put from the LUT.

SRAM is used to implement a LUT.A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports implementation of so many logic functions, however the disadvantage is unusually large number of memory cells required to implement such a logic block in case number of inputs is large.

Figure below shows a 4-input LUT based implementation of logic block

LUT based design provides for better logic block utilization. A k-input LUT based logic block can be implemented in

number of different ways with trade off between performance and logic density. An n-LUT can be shown as a direct implementation of a function truth-table. Each of the latch hold’s the value of the function corresponding to one input combination. For Example: 2-LUT can be used to implement 16 types of functions like AND , OR, A+not B .... Etc.



1452

ISSN:2229-6093

Interconnects

A wire segment can be described as two end points of an interconnect with no programmable switch between them. A sequence of one or more wire segments in an FPGA can be termed as a track.

Typically an FPGA has logic blocks, interconnects and switch blocks (Input/Output blocks). Switch blocks lie in the periphery of logic blocks and interconnect. Wire segments are connected to logic blocks through switch blocks. Depending on the required design, one logic block is connected to another and so on.

FPGA DESIGN FLOW

In this part of tutorial we are going to have a short intro on FPGA design flow. A simplified version of design flow is given in the flowing diagram.

FPGA Design Flow

Design Entry There are different techniques for design entry. Schematic based, Hardware Description Language and combination of both

etc. . Selection of a method depends on the design and designer. If the designer wants to deal more with Hardware, then Schematic entry is the better choice. When the design is complex or the designer thinks the design in an algorithmic way then HDL is the better choice. Language based entry is faster but lag in performance and density.

HDLs represent a level of abstraction that can isolate the designers from the details of the hardware implementation. Schematic based entry gives designers much more visibility into the hardware. It is the better choice for those who are hardware oriented. Another method but rarely used is state-machines. It is the better choice for the designers who think the design as a series of states. But the tools for state machine entry are limited. In this documentation we are going to deal with the HDL based design entry.

Synthesis The process which translates VHDL or Verilog code into a device netlist formate. i.e a complete circuit with logical

elements( gates, flip flops, etc…) for the design.If the design contains more than one sub designs, ex. to implement a processor, we need a CPU as one design element and RAM as another and so on, then the synthesis process generates netlist for each design element Synthesis process will check code syntax and analyze the hierarchy of the design which ensures that the design is optimized for the design architecture, the designer has selected. The resulting netlist(s) is saved to an NGC( Native Generic Circuit) file (for Xilinx® Synthesis Technology (XST)).

FPGA Synthesis



1453

ISSN:2229-6093

Implementation This process consist’s a sequence of three steps

Translate

Map

Place and Route

Translate: Process combines all the input netlists and constraints to a logic design file. This information is saved as a NGD (Native

Generic Database) file. This can be done using NGD Build program. Here, defining constraints is nothing but, assigning the ports in the design to the physical elements (ex. pins, switches, buttons etc) of the targeted device and specifying time requirements of the design. This information is stored in a file named UCF (User Constraints File). Tools used to create or modify the UCF are PACE, Constraint Editor etc.

FPGA Translate

Map

Process divides the whole circuit with logical elements into sub blocks such that they can be fit into the FPGA logic blocks. That means map process fits the logic defined by the NGD file into the targeted FPGA elements (Combinational Logic Blocks (CLB), Input Output Blocks (IOB)) and generates an NCD (Native Circuit Description) file which physically represents the design mapped to the components of FPGA. MAP program is used for this purpose.

FPGA map

Place and Route:

PAR program is used for this process. The place and route process places the sub blocks from the map process into logic blocks according to the constraints and connects the logic blocks. Ex. if a sub block is placed in a logic block which is very near to IO pin, then it may save the time but it may effect some other constraint. So trade off between all the constraints is taken account by the place and route process

The PAR tool takes the mapped NCD file as input and produces a completely routed NCD file as output. Output NCD file consists the routing information.

FPGA Place and route



1454

ISSN:2229-6093

Device Programming: Now the design must be loaded on the FPGA. But the design must be converted to a format so that the FPGA can accept it.

BITGEN program deals with the conversion. The routed NCD file is then given to the BITGEN program to generate a bit stream (a .BIT file) which can be used to configure the target FPGA device. This can be done using a cable. Selection of cable depends on the design. 4.4 Synthesis Result

The developed DWT, is simulated and verified their functionality. Once the functional verification is done, the RTL model is taken to the synthesis process using the Xilinx ISE tool. In synthesis process, the RTL model will be converted to the gate level netlist mapped to a specific technology library. Here in this Spartan 3E family, many different devices were available in the Xilinx ISE tool. In order to synthesis this DWT and IDWT design the device named as “XC3S500E” has been chosen and the package as “FG320” with the device speed such as “-4”.

The design of DWT is synthesized and its results were analyzed as follows. 4.4.1 DWT Synthesis Result This device utilization includes the following.

• Logic Utilization

• Logic Distribution

• Total Gate count for the Design

Device utilization summary:

The device utilization summery is shown above in which its gives the details of number of devices used from the available

devices and also represented in %. Hence as the result of the synthesis process, the device utilization in the used device and package is shown above.

Speed Grade: -4 Timing Summary:

Minimum period: 10.081ns (Maximum Frequency: 99.197MHz) Minimum input arrival time before clock: 4.252ns Maximum output required time after clock: 14.421ns Maximum combinational path delay: No path found



1455

ISSN:2229-6093

In timing summery, details regarding time period and frequency is shown are approximate while synthesize. After place and routing is over, we get the exact timing summery. Hence the maximum operating frequency of this synthesized design is given as 99.197 MHz and the minimum period as 10.081ns. Here, OFFSET IN is the minimum input arrival time before clock and OFFSET OUT is maximum output required time after clock.

The RTL (Register Transfer Logic) can be viewed as black box after synthesize of design is made. It shows the inputs and outputs of the system. By double-clicking on the diagram we can see gates, flip-flops and MUX.

RTL Schematic

Figure 4.9 DWT Schematic with Basic Inputs and Output

Here in the above schematic, that is, in the top level schematic shows all the inputs and final output of DWT design.

Figure 4.10 Blocks inside the Developed Top Level DWT Design

The internal blocks available inside the design includes DWT-1, DWT-2 and DWT-3 which were clearly shown in the above schematic level diagram. Inside each block the gate level circuit will be generated with respect to the modelled HDL code. 4.5 Summary

• The developed DWT are modelled and are simulated using the Modelsim tool.

• The simulation results are discussed by considering different cases.

• The RTL model is synthesized using the Xilinx tool in Spartan 3E and their synthesis results were discussed with the help of

generated reports.

V. CONCLUSION AND FUTURE WORK

5.1 Conclusion : Basically the medical images need more accuracy without loosing of information. The Discrete Wavelet Transform (DWT)

was based on time-scale representation, which provides efficient multi-resolution. The lifting based scheme (5, 3) (The high pass filter has five taps and the low pass filter has three taps) filter give lossless mode of information. A more efficient approach to lossless whose coefficients are exactly represented by finite precision numbers allows for truly lossless encoding. This work ensures that the image pixel values given to the DWT process which gives the high pass and low pass coefficients of the input image. The simulation results of DWT were verified with the appropriate test cases. Once the functional verification is done,

I NPUT S

OUTPUTS



1456

ISSN:2229-6093

discrete wavelet transform is synthesized by using Xilinx tool in Spartan 3E FPGA family. Hence it has been analyzed that the discrete wavelet transform (DWT) operates at a maximum clock frequency of 99.197 MHz respectively.

5.2 Future scope of the Work : As future work,

• This work can be extended in order to increase the accuracy by increasing the level of transformations.

• This can be used as a part of the block in the full fledged application, i.e., by using these DWT, the applications can be

developed such as compression, watermarking, etc.

REFERENCES [1] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG 2000 still image compression standard,” IEEE Signal Process. Mag., vol. 18, no. 5, pp. 36–58, Sep. 2001. [2] J.-R. Ohm, M. van der Schaar, and J. W. Woods, “Interframe wavelet coding: Motion picture representation for universal scalability,” J. Signal Process. Image Commun., vol. 19, no. 9, pp. 877–908, Oct. 2004. [3] G. Menegaz and J.-P. Thiran, “Lossy to lossless object-based coding of 3-D MRI data,” IEEE Trans. Image Process., vol. 11, no. 9, pp. 1053–1061, Sep. 2002. [4] J. E. Fowler and J. T. Rucker, “3-D wavelet-based compression of hyperspectral imagery,” in Hyperspectral Data Exploitation: Theory and Applications, C.-I. Chang, Ed. Hoboken, NJ: Wiley, 2007, ch. 14, pp. 379–407. [5] L. R. C. Suzuki, J. R. Reid, T. J. Burns, G. B. Lamont, and S. K. Rogers, “Parallel computation of 3-D wavelets,” in Proc. Scalable High-Performance Computing Conf., May 1994, pp. 454–461. [6] E. Moyano, P. Gonzalez, L. Orozco-Barbosa, F. J. Quiles, P. J. Garcia, and A. Garrido, “3-D wavelet compression by message passing on a Myrinet cluster,” in Proc. Can. Conf. Electr. Comput. Eng., vol. 2. 2001, pp. 1005–1010. [7] W. Badawy, G. Zhang, M. Talley, M. Weeks, and M. Bayoumi, “Low power architecture of running 3-D wavelet transform for medical imaging application,” in Proc. IEEE Workshop Signal Process. Syst., Taiwan, 1999, pp. 65–74. [8] G. Bernabé, J. González, J. M. García, and J. Duato, “Memory conscious 3-D wavelet transform,” in Proc. 28th Euromicro Conf. Multimedia Telecommun., Dortmund, Germany, Sep. 2002, pp. 108–113. [9] M. Weeks and M. A. Bayoumi, “Three-dimensional discrete wavelet transform architectures,” IEEE Trans. Signal Process., vol. 50, no. 8, pp. 2050–2063, Aug. 2002. [10] Q. Dai, X. Chen, and C. Lin, “Novel VLSI architecture for multidimensional discrete wavelet transform,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 8, pp. 1105–1110, Aug. 2004. [11] W. Badawy, M. Weeks, G. Zhang, M. Talley, and M. A. Bayoumi, “MRI data compression using a 3-D discrete wavelet transform,” IEEE Eng. Med. Biol. Mag., vol. 21, no. 4, pp. 95–103, Jul.–Aug. 2002. [12] J. Xu, Z. Xiong, S. Li, and Y.-Q. Zhang, “Memory-constrained 3-D wavelet transform for video coding without boundary effects,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 9, pp. 812–818, Sep. 2002. [13] B. Das and S. Banerjee, “Low power architecture of running 3-D wavelet transform for medical imaging application,” in Proc. Eng. Med. Biol. Soc./Biomed. Eng. Soc. Conf., vol. 2. 2002, pp. 1062–1063. [14] B. Das and S. Banerjee, “Data-folded architecture for running 3-D DWT using 4-tap Daubechies filters,” IEE Proc. Circuits Devices Syst., vol. 152, no. 1, pp. 17–24, Feb. 2005. [15] W. Sweldens, “The lifting scheme: A custom-design construction of biorthogonal wavelets,” Appl. Comput. Harmon. Anal., vol. 3, no. 15, pp. 186–200, 1996. [16] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting steps,” J. Fourier Anal. Appl., vol. 4, no. 3, pp. 247–269, 1998. [17] Z. Taghavi and S. Kasaei, “A memory efficient algorithm for multidimensional wavelet transform based on lifting,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), vol. 6. 2003, pp. 401–404. [18] C.-T. Huang, P.-C. Tsneg, and L.-G. Chen, “Flipping structure: An efficient VLSI architecture for lifting-based discrete wavelet transform,” IEEE Trans. Signal Process., vol. 52, no. 4, pp. 1080–1090, Apr. 2004. [19] C. Pilrisot, M. Antonini, and M. Barlaud, “3-D scan based wavelet transform and quality control for video coding,” Eur. Assoc. Signal Process. J. Appl. Signal Process., vol. 2003, pp. 56–65, Jan. 2003. [20] S. Barua, J. E. Carletta, K. A. Kotteri, and A. E. Bell, “An efficient architecture for lifting-based two-dimensional discrete wavelet transforms,” VLSI J. Integration, vol. 38, no. 3, pp. 341–352, Jan. 2005. [21] G. Kuzmanov, B. Zafarifar, P. Shrestha, and S. Vassiliadis, “Reconfigurable DWT unit based on lifting,” in Proc. Program Res. Integr. Syst. Circuits, Veldhoven, The Netherlands, Nov. 2002, pp. 325–333. [22] I. S. Uzun and A. Amira, “Design and FPGA implementation of nonseparable 2-D biorthogonal wavelet transforms for image/video coding,” in Proc. Int. Conf. Image Process. (ICIP), vol. 4. Belfast, U.K., Oct. 2004, pp. 2825–2828.



1457

ISSN:2229-6093

[23] B. Girod and S. Han, “Optimum update for motion-compensated lifting,” IEEE Signal Process. Lett., vol. 12, no. 2, pp. 150–153, Feb. 2005. [24] A. Secker and D. Taubman, “Motion-compensated highly scalable video compression using an adaptive 3-D wavelet transform based on lifting,” in Proc. IEEE Int. Conf. Image Process., Thessaloniki, Greece, Oct. 2001, pp. 1029–1032. [25] B. Pesquet-Popescu and V. Bottreau, “Three-dimensional lifting schemes for motion compensated video compression,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Salt Lake City, UT, May 2001, pp. 1793–1796. [26] N. Bozinović, J. Konrad, T. André, M. Antonini, and M. Barlaud, “Motion-compensated lifted wavelet video coding: Toward optimal motion/transform configuration,” in Proc. 12th Eur. Signal Process. Conf., Vienna, Austria, Sep. 2004, pp. 1975–1978. [27] L. Luo, S. Li, Z. Zhuang, and Y.-Q. Zhang, “Motion compensated lifting wavelet and its application in video coding,” in Proc. IEEE Int. Conf. Multimedia Expo, Tokyo, Japan, 2001, pp. 365–368. [28] Architecture and Features of a Fully Scalable Motion-Compensated 3-D Subband Codec, document M7977.doc, ISO/IEC/JTC1/SC29/WG11, Mar. 2002. [29] Improved MC-EZBC with Quarter-Pixel Motion Vectors, document 813 MPEG2002/M8366, ISO/IEC JTC1/SC29/WG11, May 2002. About Authors:

M. JANARDAN, graduated from Swami Vivekananda Institute of Technology in Electronics & Communications Stream. Now pursuing Masters in Digital Electronics and Communication Systems (DECS) from Sri Indu College of Engineering & Technology and interested in Wireless Technology & Information Security.

I express my gratitude to Dr. K ASHOK BABU Professor & Head of the Department (ECE) of and for his constant co-operation, support and for providing nnecessary facilities throughout the M.tech program. He has 15 Years of Experience, at B.Tech and M.tech Level and working as a Professor in Sri Indu College of Engg.& Technology.



1458

ISSN:2229-6093

Documents

AN EFFICIENT ARCHITECTURE FOR 3-D LIFTING …ijcta.com/documents/volumes/vol2issue5/ijcta2011020541.pdfAN EFFICIENT ARCHITECTURE FOR 3-D LIFTING-BASED DISCRETE WAVELET ... VLSI Architecture