VLSI architectures for lattice structure based orthonormal discrete wavelet transforms

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 2, FEBRUARY 1997 129

Transactions Briefs

VLSI Architectures for Lattice StructureBased Orthonormal Discrete Wavelet

Transforms

Tracy C. Denk and Keshab K. Parhi

Abstract—This brief presents efficient single-rate architectures forthe one-dimensional orthonormal discrete wavelet transform (DWT).This brief makes two contributions. First, we show that architecturesthat are based on the quadrature mirror filter (QMF) lattice structurerequire approximately half the number of multipliers and adders thancorresponding direct-form structures. Second, we present techniques formapping the 1-D orthonormal DWT to folded and digit-serial archi-tectures which are based on the QMF lattice structure. For foldedarchitectures, we discuss two techniques for mapping the QMF lat-tice structure to hardware. For digit-serial architectures, we show thatany two-channel subband system can be implemented using digit-serialprocessing techniques by utilizing the polyphase decomposition. Usingthis result, we describe an orthonormal DWT architecture which usesthe QMF lattice structure and digit-serial processing techniques. Theproposed folded and digit-serial QMF lattice structures are attractivechoices for implementations of the orthonormal DWT which require lowarea and low power dissipation.

I. INTRODUCTION

The discrete wavelet transform (DWT) has generated a greatdeal of interest recently due to its many applications across severaldisciplines, including signal processing [1]–[4]. Wavelets provide atime-scale representation of signals as an alternative to traditionaltime-frequency representations. This brief is concerned with thedesign of VLSI architectures for the one-dimensional orthonormalDWT, which projects a signal onto compactly supported orthonormalwavelet bases [2].

Several architectures for single-chip implementations of the 1-DDWT have been described in the literature [5]–[8]. Each of thesearchitectures is designed to compute the DWT in real time by inter-leaving computations for octaves 2, 3, 4,� � � between the computationsfor the first octave. These architectures differ mainly in the way thatthe intermediate results are stored and routed. These implementationsinclude systolic routing networks [7], distributed memory [8], RAM-based architectures [7], and implementations which use the minimumnumber of registers [6]. While each of these architectures mayoffer an advantage at the architectural level, none of them utilizesimprovements available at the algorithmic level. Similarly, the digit-serial [9] DWT architectures in [6] do not utilize any improvementsavailable at the algorithmic level. In this paper we specificallyconsider the implementation of the orthonormal DWT, and we showthat improved architectures can be designed by taking advantage ofan efficient algorithmic description of paraunitary filter banks knownas the QMF lattice structure. The basics of paraunitary filter banksand the QMF lattice are reviewed in Section II. Folded lattice-based

Manuscript received May 2, 1995; revised November 16, 1995. This workwas supported in part by the Advanced Research Projects Agency and theSolid State Electronics Directorate, Wright-Patterson AFB under ContractAF/F33615-93-C-1309, and an Air Force Laboratory Graduate Fellowship.This paper was recommended by Associate Editor W. Burleson.

The authors are with the Department of Electrical Engineering, Universityof Minnesota, Minneapolis, MN 55455 USA.

Publisher Item Identifier S 1057-7130(97)01689-3.

(a)

(b)

Fig. 1. The QMF lattice. (a) Analysis filter bank. (b) Synthesis filter bank.

orthogonal DWT architectures are presented in Section III, and digit-serial lattice-based orthogonal DWT architectures are presented inSection IV.

II. THE QMF LATTICE

The orthonormal DWT is computed using paraunitary quadraturemirror filter (QMF) banks. Any two-channel FIR paraunitary QMFanalysis/synthesis system can be implemented using the QMF latticestructures shown in Fig. 1(a) and (b) [10]. The QMF lattice struc-turally imposes the perfect reconstruction (PR) property and thereforeresults in PR in the presence of coefficient quantization.

For a paraunitary two-channel system with FIR analysis andsynthesis filters of orderM; each of the lattice structures in Fig. 1consists ofJ + 1 lattice stages, whereM = 2J + 1: The J + 1

coefficients,�i; which are used in both the analysis and synthesislattice structures, can be found from the impulse responses of theparaunitary filter pair [11]. An important advantage of the QMF latticeis that it reduces the number of multiplication and addition operationsper input sample compared to direct-form implementations.

III. FOLDED ARCHITECTURES

In this section, we present folded lattice-based architectures for theorthonormal DWT. The poly-phase representations of the analysis andsynthesis filter banks for a four-octave DWT are shown in Fig. 2. Forthe orthonormal DWT,EEE(z) andRRR(z) in Fig. 2 can be implementedusing the lattice structures in Fig. 1(a) and (b), respectively.

To demonstrate lattice-based folded architectures for the orthonor-mal DWT, two architectures are presented for a four-octave orthonor-mal DWT analysis filter bank using third-order paraunitary FIR filters,two-stage pipelined adders, and four-stage pipelined multipliers.

Fig. 3 shows the folded four-octave analysis architecture that isderived using the methodology described in [12]. The five outputsignalsyi(n); i = 1, 2, 3, 4, 5, labeled in Fig. 2 are time-multiplexedto signaly(n) in Fig. 3. An alternative folded four-octave analysisarchitecture, derived using the methodology described in [13], isshown in Fig. 4. As in Fig. 3, the five output signals are time-multiplexed to signaly(n) in Fig. 4. Each of these architecturesuses six multipliers and four adders. The architectures in Figs. 3 and4 mainly differ in the way the octave computations are scheduledto hardware, as indicated in the figure captions. Each of thesearchitectures has an input sampling rate of one sample every clock

1057–7130/97$10.00 1997 IEEE

130 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 2, FEBRUARY 1997

(a)

(b)

Fig. 2. Block diagram of four-octave DWT. (a) Polyphase implementation of the analysis filter bank. (b) Polyphase implementation of the synthesis filter bank.

Fig. 3. Single-rate implementation of a four-octave wavelet decomposition using third-order FIR filters, two-stage pipelined adders, and four-stage pipelinedmultipliers. The first octave is executed by the top section of the architecture, and the second, third, and fourth octaves are time-multiplexed to thebottom section of the architecture.

cycle. For each of these two analysis architectures, similar synthesisarchitectures can be derived.

In general, the number of multipliers and adders required for afoldedK-octave lattice-based wavelet analysis/synthesis architecturewith filters of order M is given by N (f)

LM = 2(M + 3) andN

(f)LA = 2(M + 1); respectively. For direct-form based architectures

in [5]–[8] which have the same input sampling rate as the foldedlattice-based architectures (i.e., an input sampling rate of 1 sampleevery clock cycle), the number of multipliers and adders required isN

(f)DM = 4(M + 1) andN (f)

DA = 4M; respectively. AsM becomeslarge, the number of multipliers and adders used in the folded latticearchitecture approaches one-half the number used in the direct-form

architectures. This is a significant advantage for applications of theDWT where minimization of area and power is important.

Table I compares the performance of two three-octave DWT analy-sis/synthesis systems (found in [6] and [12]) which use third order FIRparaunitary filters and multipliers which are pipelined by four stages.This table shows that the increase in latency is the main drawbackof using the lattice-based folded architectures compared to the direct-form folded architectures in [5]–[8]. The latency is defined as thedifference between the time when a sample is input to the analysisfilter bank and the time when the corresponding reconstructed sampleis output from the synthesis filter bank.

The methodologies in [12] and [13] for designing folded lattice-based DWT architectures allow for pipelining of the hardware func-

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 2, FEBRUARY 1997 131

Fig. 4. Single-rate implementation of a four-octave wavelet decomposition using third-order FIR filters, two-stage pipelined adders, and four-stage pipelinedmultipliers. All four octaves are time-multiplexed to the entire architecture.

TABLE IPERFORMANCE COMPARISON OF FOLDED DWT ARCHITECTURES

FOR THREE-OCTAVE DWT ANALYSIS/SYNTHESIS SYSTEM USING

THIRD-ORDER FIR FILTERS AND MULTIPLIERS PIPELINED BY FOUR STAGE

tional units at any desired level. This can be used to increase thesample rate or reduce power consumption [14]. The architecturespresented in [7] and [8] are not easy to pipeline, while the foldedarchitectures in [6] and [15] can be easily pipelined. However,the architectures in [7] and [8] have regular structures which maylead to more area-efficient layouts. In addition to the advantagesoutlined in this section, the folded lattice structure also exhibitsadvantages inherent to the two-channel paraunitary QMF lattice, suchas robustness to coefficient quantization.

IV. DIGIT-SERIAL ORTHONORMAL DWT ARCHITECTURES

Digit-serial architectures process more-than-one but not-all of thebits of a sample during each clock cycle [9]. The number of bitsprocessed in a clock cycle is thedigit-size. Two special cases of digit-serial processing are word-serial processing, where the digit-size isequal to the wordlength, and bit-serial processing, where the digit-size is unity. In this brief we are interested in digit-serial architectureswhere the digit-size is a divisor of the wordlength.

The DWT analysis/synthesis system can be implemented digit-serially as in Fig. 5. In this case, the wordlength ofx(n) must bea multiple of 8 since digit-sizew=8 is used and the digit-size isrestricted to be a divisor of the wordlengthw: For the orthonormalDWT, matricesEEEw=j(z

j) andRRRw=j(zj) can be implemented using

digit-serial implementations of the analysis and synthesis QMF latticestructures shown in Fig. 1(a) and (b), respectively, with digit-sizew=j:

As an example of the digit-serial implementation of the QMFlattice structures, consider the implementation of matrixEEEw=4(z

4)in Fig. 5(a) wherew = 8: This is simply the digit-serial im-plementation of the paraunitary QMF lattice structure in Fig. 1(a)with digit-size w=4 = 2: Let H0(z) be the third order FIR filterwith impulse responseh(0) = (1 +

p3)=(4

p2); h(1) = (3 +

(a)

(b)

Fig. 5. Digit-serial implementation of the three-level discrete wavelet trans-form. (a) Analysis system. (b) Synthesis system. The wordlength is assumedto be a multiple of eight.

p3)=(4

p2); h(2) = (3�

p3)=(4

p2); andh(3) = (1�

p3)=(4

p2);

which is the maximally-regular four-tap wavelet filter derived in [2].With h1(n) = (�1)nh0(M �n); which results from the paraunitaryproperty, we find�0 = �1.732 050 8 and�1 = 0.267 949 19 [11].For certain applications it may be sufficient to implement the DWTanalysis/synthesis system using�̂0 = �1.75 and�̂1 = 0.25. Sincethe QMF lattice is used, the design techniques presented in this brieffor folded and digit-serial architectures can be used to synthesizearchitectures with the PR property in spite of coefficient quantization.Stage 1 is shown in Fig. 6(a). A digit-serial architecture for stage 1of the lattice is shown in Fig. 6(b) assuming a two’s-complementimplementation with wordlengthw = 8 and digit-sizew=4 = 2.Systematic techniques can also be used to design the digit-serialarchitectures [9].

If we let = �Ki=1 2�i; the total number of equivalent bit-parallel multipliers required for the digit-serial lattice implementationof the K-octave analysis/synthesis DWT architecture isN (ds)

LM =

2 (M+3) = N(f)LM ; and the total number of equivalent bit-parallel

adders required isN (ds)LA = 2 (M + 1) = N

(f)LA : The direct-form-

based digit-serial architecture requiresN (ds)DM = 4 (M+1) = N

(f)DM

multipliers andN (ds)DA = 4 M = N

(f)DA adders. Table II summarizes

the number of hardware multipliers and adders used by variousdesigns.

As with folded architectures, properties such as system latencyand the required number of registers must be taken into accountwhen comparing digit-serial architectures based on the direct-form

132 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 2, FEBRUARY 1997

(a) (b)

Fig. 6. (a) A single analysis lattice stage using coefficient�̂1 = 0.25. (b) The digit-serial implementation where digit-size is one-fourth the wordlength, whichis assumed to bew = 8 in this case. The connections shown are single-bit wires. Subscripti denotes biti of a sample, where bit 0 is the least significant bit.

TABLE IIOPERATORSREQUIRED FORK-OCTAVE ANALYSIS/SYNTHESIS

DWT USING PARAUNITARY FIR FILTERS OF ORDER M FOR

VARIOUS IMPLEMENTATION STYLES, WHERE = �Ki=1

2�i

TABLE IIIPERFORMANCE COMPARISON OF DIGIT-SERIAL DWT

ARCHITECTURES FORTHREE-OCTAVE ANALYSIS/SYNTHESIS DWTUSING THIRD-ORDER FIR FILTERS AND WORDLENGTH 16

and lattice implementations. Table III compares these properties fora three-octave wavelet analysis/synthesis system using third-orderparaunitary FIR filters and wordlength 16. When comparing thelattice-based folded DWT architectures to the lattice-based digit-serial DWT architectures, we find that the folded architectures offerlower latency and no constraints on wordlength, while the digit-serial architectures offer 100% hardware utilization, lower powerconsumption, and simpler connectivity.

V. CONCLUSIONS

In this brief we have developed single-rate lattice-based archi-tectures for the orthonormal DWT. By taking advantage of theproperties of the QMF lattice developed in [10], these architecturesrequire considerably fewer multipliers and adders than the direct-form architectures presented in [5]–[8] at the expense of increasedlatency. The lattice-based architectures are ideal for applications of

the orthonormal DWT where silicon area and power dissipation arecritical.

REFERENCES

[1] S. G. Mallat, “Multifrequency channel decompositions of images andwavelet models,”IEEE Trans. Acoust., Speech, Signal Processing,vol.37, pp. 2091–2110, Dec. 1989.

[2] I. Daubechies, “Orthonormal bases of compactly supported wavelets,”Comm. Pure Applied Math.,vol. 41, pp. 909–996, Nov. 1988.

[3] G. Strang, “Wavelets and signal processing,”IEEE Signal ProcessingMag., pp. 14–38, Oct. 1991.

[4] O. Rioul and M. Vetterli, “Wavelets and signa processing,”IEEE SignalProcessing Mag., pp. 14–38, Oct. 1991.

[5] G. Knowles, “VLSI architecture for the discrete wavelet transform,”Electron. Lett.,vol. 26, pp. 1184–1185, July 1990.

[6] K. K. Parhi and T. Nishitani, “VLSI architectures for discrete wavelettransforms,”IEEE Trans. VLSI Syst.,vol. 1, pp. 191–202, June 1993.

[7] M. Vishwanath, R. M. Owens, and M. J. Irwin, “VLSI architectures forthe discrete wavelet transform,”IEEE Trans. Circuits Syst. II, vol. 42,pp. 305–316, May 1995.

[8] J. Fridman and E. Manolakos, “Distributed memory and control VLSIarchitectures for the 1-D discrete wavelet transform,” inVLSI SignalProcessing, VII,pp. 388–397, 1994.

[9] K. K. Parhi, “A systematic approach for design of digit-serial signal pro-cessing architectures,”IEEE Trans. Circuits Syst.,vol. 38, pp. 358–375,Apr. 1991.

[10] P. P. Vaidyanathan and P. Hoang, “Lattice structures for optimal designand robust implementation of two-channel perfect reconstruction QMFbanks,” IEEE Trans. Acoust., Speech, Signal Processing,vol. 36, pp.81–94, Jan. 1988.

[11] P. P. Vaidyanathan,Multirate Systems and Filter Banks.EnglewoodCliffs, NJ: Prentice Hall, 1993.

[12] T. C. Denk and K. K. Parhi, “Architectures for lattice structure basedorthonormal discrete wavelet transforms,” inProc. 1994 IEEE Int.Conf. Application-Specific Array Processing. San Francisco, CA: IEEEComputer Society Press, Aug. 1994, pp. 259–270.

[13] , “Systematic design of architectures forM -ary tree-structuredfilter banks,” inVLSI Signal Processing, VIII, T. Nishitani and K. Parhi,Eds. Piscataway, NJ: IEEE Press, Oct. 1995, pp. 157–166.

[14] A. Chandrakasan, S. Sheng, and R. Brodersen, “Low-power CMOSdigital design,”IEEE J. Solid-State Circuits,vol. 27, pp. 473–484, Apr.1992.

[15] C. Chakrabarti and M. Vishwanath, “Efficient realizations of the discreteand continuous wavelet transforms: From single chip implementationsto mappings on SIMD array computers,”IEEE Trans. Signal Processing,vol. 43, pp. 759–771, Mar. 1995.

Documents

VLSI architectures for lattice structure based orthonormal discrete wavelet transforms