11
Signal Processing 83 (2003) 1093 – 1103 www.elsevier.com/locate/sigpro Generalized subband decomposition adaptive lters for sparse systems Kutluyl Do gan cay School of Electrical and Information Engineering, University of South Australia, Mawson Lakes SA 5095, Australia Received 15 July 2002; received in revised form 6 January 2003 Abstract Transform domain adaptive ltering algorithms can provide signicant improvement in the convergence rate of time domain adaptive lters such as the least-mean-square (LMS) algorithm for coloured input signals. For sparse systems, the convergence rate can be further increased if the active region of the system response is identied. A number of fast-converging time domain adaptive ltering algorithms have been developed in the past for sparse systems. Because sparse systems in the time domain do not necessarily translate into sparse systems in the transform domain, fast-converging time domain algorithms cannot be applied to transform domain algorithms directly. In this paper we show that if a generalized subband decomposition structure is employed, the sparsity of the system response can be preserved in the transform domain. A fast-converging algorithm based on the proportionate normalized LMS algorithm is developed for generalized subband decomposition and its eectiveness is demonstrated in computer simulations. ? 2003 Elsevier Science B.V. All rights reserved. Keywords: Transform domain adaptive lters; Sparse adaptive lters; Generalized subband decomposition; Coecient drift 1. Introduction Sparse system identication is an important appli- cation of adaptive lters in communication systems, echo cancellation and geophysical exploration. Sparse systems are characterized by long impulse responses with a small number of non-zero samples. The non-zero samples constitute the active region of the system response. For simple stochastic gradient-based algorithms such as the least-mean-square (LMS) algorithm, the convergence rate of the adaptive l- ter decreases with increasing lter length [17]. If zero regions of a sparse system can be located, the E-mail address: [email protected] (K. Do gan cay). identication of its impulse response can be performed by using a shorter adaptive lter. The advantage of us- ing a shorter lter is two-fold, viz., reduced complex- ity and faster convergence. In time domain adaptive ltering, various approaches exist for sparse system identication. In the transform domain, the same ap- proaches cannot be used directly unless the transform is known to preserve the sparsity of the system. Trans- form domain adaptive ltering has the advantage of improved convergence rate with respect to time do- main adaptive lters for correlated (coloured) input signals with non-at power spectra [10]. For sparse systems, the convergence rate of transform domain adaptive lters can be further increased if the sparsity is exploited. The previous work in the area of adaptive sparse system identication includes: (i) an adaptive delay 0165-1684/03/$ - see front matter ? 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0165-1684(03)00015-X

Generalized subband decomposition adaptive filters for sparse systems

Embed Size (px)

Citation preview

Page 1: Generalized subband decomposition adaptive filters for sparse systems

Signal Processing 83 (2003) 1093–1103

www.elsevier.com/locate/sigpro

Generalized subband decomposition adaptive "lters forsparse systemsKutluy&l Do(gan)cay

School of Electrical and Information Engineering, University of South Australia, Mawson Lakes SA 5095, Australia

Received 15 July 2002; received in revised form 6 January 2003

Abstract

Transform domain adaptive "ltering algorithms can provide signi"cant improvement in the convergence rate of timedomain adaptive "lters such as the least-mean-square (LMS) algorithm for coloured input signals. For sparse systems, theconvergence rate can be further increased if the active region of the system response is identi"ed. A number of fast-convergingtime domain adaptive "ltering algorithms have been developed in the past for sparse systems. Because sparse systems in thetime domain do not necessarily translate into sparse systems in the transform domain, fast-converging time domain algorithmscannot be applied to transform domain algorithms directly. In this paper we show that if a generalized subband decompositionstructure is employed, the sparsity of the system response can be preserved in the transform domain. A fast-convergingalgorithm based on the proportionate normalized LMS algorithm is developed for generalized subband decomposition andits e:ectiveness is demonstrated in computer simulations.? 2003 Elsevier Science B.V. All rights reserved.

Keywords: Transform domain adaptive "lters; Sparse adaptive "lters; Generalized subband decomposition; Coe>cient drift

1. Introduction

Sparse system identi"cation is an important appli-cation of adaptive "lters in communication systems,echo cancellation and geophysical exploration. Sparsesystems are characterized by long impulse responseswith a small number of non-zero samples. Thenon-zero samples constitute the active region of thesystem response. For simple stochastic gradient-basedalgorithms such as the least-mean-square (LMS)algorithm, the convergence rate of the adaptive "l-ter decreases with increasing "lter length [17]. Ifzero regions of a sparse system can be located, the

E-mail address: [email protected](K. Do(gan)cay).

identi"cation of its impulse response can be performedby using a shorter adaptive "lter. The advantage of us-ing a shorter "lter is two-fold, viz., reduced complex-ity and faster convergence. In time domain adaptive"ltering, various approaches exist for sparse systemidenti"cation. In the transform domain, the same ap-proaches cannot be used directly unless the transformis known to preserve the sparsity of the system. Trans-form domain adaptive "ltering has the advantage ofimproved convergence rate with respect to time do-main adaptive "lters for correlated (coloured) inputsignals with non-Eat power spectra [10]. For sparsesystems, the convergence rate of transform domainadaptive "lters can be further increased if the sparsityis exploited.The previous work in the area of adaptive sparse

system identi"cation includes: (i) an adaptive delay

0165-1684/03/$ - see front matter ? 2003 Elsevier Science B.V. All rights reserved.doi:10.1016/S0165-1684(03)00015-X

Page 2: Generalized subband decomposition adaptive filters for sparse systems

1094 K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103

"lter approach [2] that locates signi"cant samples ofa sparse "nite impulse response (FIR) "lter by meansof adapting the delay, as well as the amplitude of in-dividual "lter coe>cients; (ii) a least-squares-baseddetection algorithm [8,9] to identify active taps of asparse FIR system; (iii) a transform domain adaptive"lter using the Haar transform [1] to recursively iden-tify non-zero coe>cients of a sparse system; (iv) aproportionate NLMS (PNLMS) algorithm [4,5] thatchanges the stepsize for individual adaptive "lter coef-"cients proportionately with the estimated magnitudeof the system response given by the current "lter coef-"cients; and (v) a proportional weight LMS algorithm[13] which is similar to the PNLMS algorithm. Theadvantage of the last two algorithms over the others isthat the active regions of a sparse system do not haveto be identi"ed explicitly, thereby eliminating someof the problems associated with misses in active re-gion detection that increase the mean-square error ofthe adaptation algorithm.The main di>culty with exploiting sparsity in trans-

form domain adaptive "lters is the translation of timedomain sparsity to transform domain sparsity. Forexample, the transform domain LMS (TD-LMS) algo-rithm [14] cannot take advantage of sparse systems be-cause it is not capable of preserving the sparsity in thetransform domain. However, as we will see later in thepaper, the generalized subband decomposition (GSD)structure [15], which is a generalization of TD-LMS, iscapable of preserving the sparsity provided the trans-form size and the sub"lter sparsity factor are identi-cal, which corresponds to the case of non-overlappingtransforms for the "lter input vector. In this case, GSDwill have a unique coe>cient vector corresponding toa given time domain response. If the sub"lter spar-sity factor is chosen less than the transform size, GSDwill no longer have a unique coe>cient vector for agiven desired "lter response although some improve-ment can be expected in the convergence rate [15].This latter case is not desirable in practical "nite pre-cision implementations because of the likelihood ofarithmetic overEow due to coe>cient drifts [11]. An-other disadvantage of this case is that sparsity can-not be preserved in the transform domain, ruling outthe possibility of convergence rate improvement forsparse systems. In this paper we show that the GSDstructure with unique solution lends itself readily toadaptive sparse system identi"cation in the transform

domain. We demonstrate this by developing a propor-tionate LMS algorithm for the GSD structure.The paper starts with an overview of the GSD-LMS

algorithm. In Section 3, the relationship between theGSD coe>cients and the impulse response of the cor-responding time domain FIR "lter is explored. Theuniqueness of the GSD solution and its implicationon time domain and transform domain sparsity is ex-plained in Section 4. The convergence speed for GSDadaptive "lters is studied in Section 5 by way of eigen-spread analysis. In Section 6, a new GSD-LMS algo-rithm is developed, based on the PNLMS algorithm,to take advantage of sparse systems. Section 7 in-cludes simulation studies to show the non-uniquenessof the GSD-LMS solution and to demonstrate theconvergence improvement achieved by the newGSD-LMS algorithm for speech-like signals andsparse network echo channels recommended by ITU-TG.168 [3].

2. Overview of the GSD-LMS algorithm

The GSD structure for adaptive system identi"ca-tion is shown in Fig. 1 [15]. GSD uses an M × M"xed unitary transform matrix T (i.e., THT=I whereI is the identity matrix) which acts as an analysis"lterbank. The transform T can be obtained from adiscrete-time transform such as the discrete Fouriertransform (DFT), the discrete cosine transform (DCT)or the discrete Hartley transform (DHT), to name buta few. Indeed, any unitary matrix is a candidate forT . The optimal transform is obtained from the auto-correlation matrix of the input signal, and is known asthe Karhunen–LoeOve transform (KLT). Because theKLT requires prior knowledge of the second-orderstatistics of the input signal and is di>cult to computeon-the-Ey, it is not used in practical implementations.The output of the transform is given by

v(k) = Tx(k); (1)

where x(k) = [x(k); x(k − 1); : : : ; x(k − M + 1)]T

and v(k) = [v0(k); v1(k); : : : ; vM−1(k)]T. The trans-form outputs vi(k) are applied to sparse adaptiveFIR sub"lters with transfer functions Wi(zL) (seeFig. 1). Each sub"lter has K non-zero coe>cientswi;0(k); wi;1(k); : : : ; wi;K−1(k) spaced by the sparsity

Page 3: Generalized subband decomposition adaptive filters for sparse systems

K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103 1095

M×MUnitary

TransformT

)(0LzW

)(1LzW

)(1L

M zW −

.

.

.

)(kx

)(0 kv

)(1 kv

)(1 kvM −

Σ)(ky

Fig. 1. Generalized subband decomposition structure for adaptive "lters.

factor L. At time k, the transfer functions of the adap-tive sub"lters are given by

Wi(zL) =K−1∑j=0

wi;j(k)z−jL; i = 0; 1; : : : ; M − 1:

For L ≤ M , the relationship between "lter parametersis given by

N = (K − 1)L+M; (2)

where N is the e:ective length of the adaptive "lter inthe time domain. The total number of adaptive "ltercoe>cients is KM . If L¡M , the equivalent length Nof the GSD structure is smaller than the number ofcoe>cients KM .The output of the adaptive "lter is

y(k) = !T(k)�(k); (3)

where

!(k) =

w0(k)

w1(k)

...

wK−1(k)

KM×1

;

�(k) =

v(k)

v(k − L)

...

v(k − (K − 1)L)

KM×1

;

(4)

and wi(k) is the M × 1 vector:

wi(k) = [w0; i(k); w1; i(k); : : : ; wM−1; i(k)]T:

The error signal of the adaptive "lter is e(k)=d(k)−y(k) where d(k) is the desired "lter response. TheGSD-LMS algorithm is given by [15]

!(k + 1) = !(k) + �e(k)G−2�∗(k); (5)

where G 2 is the KM×KM augmented diagonal powermatrix of transform outputs:

G 2 =

2 0

. . .

0 2

KM×KM

: (6)

Here the M ×M power matrix 2 is de"ned by 2 =diag{�2

0 ; �21 ; : : : ; �

2M−1} where �2

i = E{|vi(k)|2}, as-suming a stationary input signal. The power matrix2 can be estimated online using a sliding exponentialwindow:

�2i (k) = ��2

i (k − 1) + |vi(k)|2;i = 0; : : : ; M − 1; (7)

where 0¡�¡ 1 is the forgetting factor. The e:ectivelength of the window is proportional to 1=(1− �) [7].For non-stationary input signals, (7) enables track-ing of changes in the signal power levels. Using (7),

Page 4: Generalized subband decomposition adaptive filters for sparse systems

1096 K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103

(6) can be replaced by

G 2(k) =

2(k) 0

2(k−L)

. . .

0 2(k−(K−1)L)

KM×KM

;

where 2(k)=diag{�20(k); �

21(k); : : : ; �

2M−1(k)}. Note

that the diagonal partitions of G 2(k) use power esti-mates at di:erent time instants spaced by L. This isconsistent with the way �(k) is de"ned in (4) sinceeach transform output vector in �(k) has to be powernormalized by its respective power estimate.The TD-LMS algorithm is a special case of the

GSD-LMS algorithm. Indeed, settingM=N and K=1reduces GSD-LMS to TD-LMS. The GSD-LMS algo-rithm not only has a lower computational complexitythan TD-LMS, but also has a convergence rate veryclose to that of the TD-LMS algorithm [15]. The maindisadvantages of TD-LMS that make it impractical arethe large computational complexity associated withdivisions required for power normalization and the in-ability to preserve the sparsity of a time domain FIRsystem in the transform domain.

3. Relationship between GSD coe$cients and timedomain response

In this section we derive the relation of the GSDcoe>cients with the impulse response of an equivalenttime domain FIR "lter. Substituting (1) into (3), theoutput of the GSD structure can be written as

y(k) = !T(k)F (k); (8)

where

F =

T 0

T

. . .

0 T

KM×KM

;

(k) =

x(k)

x(k − L)

...

x(k − (K − 1)L)

KM×1

:

In the time domain, this corresponds to the followinginput–output relationship:

y(k) = hT(k)

x(k)

x(k − 1)

...

x(k − N + 1)

N×1

; (9)

where h(k) is the N×1 impulse response vector of theequivalent FIR "lter that represents the GSD structure

h(k) = [h0(k); h1(k); : : : ; hN−1(k)]T:

In order to "nd the link between the GSD coe>-cients !(k) and the corresponding time domain FIR"lter coe>cients h(k), let us rewrite (9) as

y(k) = sT(k) (k); (10)

where the KM × 1 vector s(k) can be partitioned intoM × 1 vectors si(k):

s(k) =

s0(k)

s1(k)

...

sK−1(k)

KM×1

:

A comparison of (9) and (10) yields the followingmatrix equation whose solution gives the si(k):

s0(k) 0L×1 0L×1 · · · 0L×1

s1(k) 0L×1 · · · 0L×1

s2(k). . .

...

. . . 0L×1

0 sK−1(k)

N×K

1

1

1

...

1

K×1

= h(k): (11)

Page 5: Generalized subband decomposition adaptive filters for sparse systems

K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103 1097

For L¡M , the solution for s(k) is not uniquebecause there will be more unknowns than equa-tions in (11). If L = M , we get s(k) = h(k), i.e.,the solution for s(k) is uniquely given by the timedomain impulse response of the equivalent FIR "l-ter. Consider the N × K matrix in (11) that con-tains the column partitions si(k) of s(k). If thismatrix has only one s(k) entry in a given row j(i.e., L = M and none of the column partitionssi(k) overlap), then it must be equal to hj−1(k).If, in a given row j, there are more than ones(k) entry (i.e., L¡M and the column partitionssi(k) overlap), then their sum must be equal tohj−1(k). In this case, it is possible to "nd in"nitelymany combinations for these entries of s(k) thatwill satisfy (11), implying a non-unique solutionfor s(k).From (8) and (10), we see that the GSD coe>cients

!(k) are related to s(k) through

!(k) = F∗s(k): (12)

If L = M , s(k) is uniquely speci"ed, i.e., the GSDsolution will be uniquely given by !(k) = F∗h(k).If L¡M , however, s(k) is non-unique for agiven h(k), rendering the GSD solution !(k)non-unique.The following example illustrates the link between

!(k) and h(k).

3.1. Example 1

Suppose that the GSD parameters are M =3, L=2,and K = 3 with N = 7. Eq. (11) then becomes

s00(k) 0 0

s01(k) 0 0

s02(k) s10(k) 0

0 s11(k) 0

0 s12(k) s20(k)

0 0 s21(k)

0 0 s22(k)

7×3

1

1

1

=

h0(k)

h1(k)

h2(k)

h3(k)

h4(k)

h5(k)

h6(k)

;

where si(k) = [si0(k); si1(k); si2(k)]T, i = 0; 1; 2. Thesolution for s(k) is given by

s(k) = [h0(k); h1(k); �; h2(k)− �; h3(k); �; h4(k)

−�; h5(k); h6(k)]T;

where � and � can take on any value, making s(k)non-unique. The GSD coe>cients !(k) are given by(12). Because s(k) is non-unique, so is !(k).Suppose now that the GSD parameters areM=L=2,

and K=3 with N=6. In this case, (11) takes the form

s00(k) 0 0

s01(k) 0 0

0 s10(k) 0

0 s11(k) 0

0 0 s20(k)

0 0 s21(k)

6×3

1

1

1

=

h0(k)

h1(k)

h2(k)

h3(k)

h4(k)

h5(k)

;

which results in the unique solution s(k) = h(k). TheGSD coe>cients !(k) are uniquely given by (12).

4. Sparse systems and GSD

If GSD with L¡M is applied to a sparse system,there is no guarantee that the GSD solution will alsobe sparse because of the non-uniqueness of the solu-tion. The non-uniqueness of the GSD solution can alsolead to potentially catastrophic coe>cient drifts in "-nite precision implementations of the adaptive GSDstructure. The sparsity of a time domain system can bemaintained in the transform domain only if the trans-form size is equal to the sub"lter sparsity factor (i.e.,L=M) which ensures that the GSD solution is uniqueand that successive transforms do not overlap in thetime domain. The following example illustrates theseobservations.

4.1. Example 2

Consider a GSD structure with M = 3, L = 2 andK=3. Assume that we have h(k)=[0; 0; 0:5; 1; 0; 0; 0]T

which is a sparse system. This yields s(k) =[0; 0; �; 0:5− �; 1; �;−�; 0; 0]T where � and � are arbi-trary real numbers. The 3× 3 transform T is appliedto "lter input blocks of length 3 shifted by 2, which

Page 6: Generalized subband decomposition adaptive filters for sparse systems

1098 K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103

implies overlapping input blocks for the unitary trans-form. The GSD coe>cients for this sparse system aregiven by

!(k) =

T

T

T

9×9

0

0

0:5− �

1

−�

0

0

; �; �∈R:

In general, the entries of !(k) will be non-zero, i.e.,!(k) will not be sparse, even though h(k) is. Whatis more, because � and � are arbitrary, some entriesof !(k) can be extremely large, leading to potentialarithmetic overEow problems in "nite precision im-plementations.The only condition that guarantees sparsity for!(k)

is M = L. Assuming that M = L= 2 in the above, weget s(k) = [0; 0; 0:5; 1; 0; 0]T and

!(k) = [0; 0; ∗; ∗; 0; 0]T

where ∗ represents non-zero entries of !(k). The 2×2transform T operates on non-overlapping "lter inputblocks of length 2. In this case, the zero partitionss0(k) and s2(k) map into zero partitions in the GSDsolution, i.e., the sparsity of the time domain responseis maintained in the transform domain.

4.2. Sparsity in the transform domain

Based on the results of Example 2 in relation to theGSD structure’s ability to preserve the time domainsparsity in the transform domain, we formally havethe following result:

Observation 1. Supposing M = L, if

[hiM (k); hiM+1(k); : : : ; h(i+1)M−1(k)]T = 0 (13a)

for any i∈{0; 1; : : : ; K − 1}, then we havewi(k) = 0; (13b)

which is the ith M × 1 column vector partition of theGSD coe4cient vector !(k).

Note that not every zero partition of length M inh(k) is guaranteed to yield a zero partition in !(k)unless it satis"es the de"nition in (13a). For example,if there are some zero regions of length M that fallbetween successive blocks de"ned in (13a), then nocorresponding zero blocks are guaranteed to occur inthe transform domain. To alleviate the problem thatmay be caused by this blockwise mapping of zeroregions, it is desirable to choose a small M . However,for grossly sparse systems such as network echo paths,the choice of M is not so critical.

5. Convergence of GSD adaptive )lters

The convergence speed of the GSD-LMS algorithmis dependent on the whitening (decorrelation) abilityof the transform matrix T which is reEected by theeigenspread (condition number) of the transform out-put autocorrelation matrix after power normalization.For the GSD structure, the autocorrelation matrix ofthe transform output �(k) is

R� = FxRFHx ; (14)

where

Fx =

T 0

0M×L T

0M×L 0M×L T

......

. . .. . .

0M×L 0M×L · · · 0M×L T

KM×N

and

R=

E

x(k)

x(k−1)

...

x(k−N+1)

[ x∗(k) x∗(k−1) · · · x∗(k−N+1) ]

:

Note that the transform matrices T in Fx are notdiagonally aligned unless L=M . TheMK×MK matrixR� is rank-de"cient by MK − N , assuming that R isfull-rank (i.e., the "lter input is persistently exciting).The autocorrelation matrix of the power normalized

Page 7: Generalized subband decomposition adaptive filters for sparse systems

K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103 1099

Table 1Normalized autocorrelation values for typical speech signal

i ri i ri

0 1.00000000 8 −0:097941041 0.91189350 9 −0:211973502 0.75982820 10 −0:304469603 0.59792770 11 −0:344713704 0.41953610 12 −0:347368405 0.27267350 13 −0:328812806 0.13446390 14 −0:292697507 0.00821722 15 −0:24512650

transform output G−2R� will also be rank de"cient byMK−N (i.e., it will have MK−N zero eigenvalues).For L=M , Fx is a square block diagonal matrix, im-plying a full-rank R�. This is in agreement with theprevious observation that the GSD coe>cient vectoris not unique if L¡M (i.e., Fx has more rows thancolumns), and a unique GSD solution exists if L=M .The condition number (eigenspread) of G−2R�, ig-noring any zero eigenvalues, determines the conver-gence speed of GSD-LMS. Zero eigenvalues do nota:ect the convergence speed apart from making thesolution non-unique. The number of zero eigenvaluesgives the dimension of the null space of the solution.In general, the closer the condition number to unity,the faster the convergence.

5.1. Example 3

Assume that the input signal x(k) to a GSD-LMSadaptive "lter is a speech signal with normalizedautocorrelation values listed in Table 1 [14]. The

Table 2Eigenspread for N = 15

GSD parameters Eigenspread

G−2R�

M L K R DCT DFT

12 3 2 302.2 187.4 58.810 5 2 308.5 38.2 49.18 7 2 241.7 28.3 59.69 3 3 340.2 28.8 60.55 5 3 221.2 37.1 70.512 1 4 323.0 59.7 65.03 3 5 221.2 64.5 78.5

eigenspread of R and G−2R� for DCT and DFT arelisted in Table 2 for various GSD parameter com-binations producing N = 15. The DCT appears toyield better whitening (lower eigenspread) than theDFT. This is not surprising because of the betterspectral compression capability of the DCT for speechsignals [14].There is no consistent relationship between the GSD

parameters and the resulting eigenspread. This empha-sizes the need for careful selection of the GSD param-eters based on any prior information available aboutthe input signal. Because transform matrices derivedfrom discrete transforms such as the DFT and DCTare "xed and signal-independent, the availability ofprior information becomes more important [12].In Table 2, only the GSD parameter combinations

for which L =M are capable of maintaining sparsityof the system response in the transform domain. Thecombinations M = L= 5 and M = L= 3 are the twopossibilities for N = 15. For these combinations, theachieved reduction in eigenspread with respect to theinput signal is comparable to the other combinationsfor which L¡M . However, on the whole, choosingL¡M appears to achieve a better eigenspread reduc-tion.

6. GSD-LMS for sparse systems

In this section we modify the GSD-LMS algorithmin order to exploit the system sparsity in the transformdomain. As we have already seen, the sparsity anduniqueness of the GSD solution is guaranteed if the

Page 8: Generalized subband decomposition adaptive filters for sparse systems

1100 K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103

two GSD parameters L andM are identical. Therefore,throughout this section we will assume L = M . Thetransform domain adaptation algorithm developed inthis section is based on the PNLMS algorithm [4,5]which exploits the sparsity of the system response tospeed up the convergence of the NLMS algorithm byweighting the regressor data proportionately with theestimated magnitude of the system impulse response.We will derive the GSD proportionate LMS

(GSD-PLMS) algorithm—the transform domaincounterpart of PNLMS—in its own right from thesolution of a weighted constrained minimizationproblem. To this end, we rewrite (5) as

G!(k + 1) = G!(k) + �e(k)G−1�∗(k); (15)

where the power normalization of the transform outputis shown explicitly. De"ning

z(k), G!(k) and u(k), G−1�(k); (16)

(15) becomes

z(k + 1) = z(k) + �e(k)u∗(k);

which is in the form of a stochastic descent-basedadaptive "ltering algorithm.Reminiscent of the NLMS algorithm [6], we can

derive the GSD-PLMS algorithm from the solutionof a constrained minimization problem. However, tofavour a sparse solution, we introduce a weighting tothe cost function to be minimized. The weighting isgiven by the inverse of the diagonal matrix P(k) thatis obtained from the adaptive "lter coe>cients andhas diagonal entries proportional to the magnitude ofthe "lter coe>cients. The weighted constrained mini-mization problem is de"ned by

minz(k+1)

”H(k)P−1(k)”(k) (17a)

s:t: zT(k + 1)u(k) = d(k); (17b)

where ”(k) , z(k + 1) − z(k) is the change in thecoe>cient vectors after the update at time k. Eq. (17)requires the solution vector z(k + 1) to deviate fromz(k) in a minimal fashion in the weighted ‘2 normsense while yielding zero a posteriori error at the "lteroutput. This property of (17) can be regarded as amanifestation of the principle of minimal disturbance[7,16].The constrained minimization problem in (17) can

be solved by using the method of Lagrangemultipliers.

The cost function to be minimized is

J (k)=”H(k)P−1(k)”(k)+�(zT(k+1)u(k)−d(k));

where � is a Lagrange multiplier. Solving 9J (k)=9z(k +1)= 0 and 9J (k)=9�=0 for z(k +1) and �, andintroducing a small stepsize � to control the speed ofconvergence, we get

z(k + 1) = z(k) + �e(k)P(k)u∗(k)

uH(k)P(k)u(k): (18)

Using (16) in the above equation and ignoring thestepsize normalization in the denominator of the up-date term, we obtain the GSD-PLMS algorithm:

!(k + 1) = !(k) + �e(k)P(k)G−2�∗(k): (19)

We remark that the stepsize normalization in (18) isredundant because the entries of the input vector u(k)are approximately white thanks to the decorrelationproperty of the transform matrix T . For white inputsignals, the stepsize normalization provides no im-provement in the convergence speed.While P(k) can be chosen as any reasonable diag-

onal matrix with non-negative entries, in the case ofGSD-LMS the diagonal entries of P(k) are chosen toapproximate the magnitude of the estimated systemresponse. This results in an uneven distribution of theavailable energy in the regressor vector over the adap-tive "lter coe>cients. If we set P(k)= I , GSD-PLMSreduces to GSD-LMS.Noting the partitioning of !(k) into K column

vectors w0(k); : : : ;wK−1(k) in (4) and the sparsityrelationship between h(k) and !(k) given by (13),the proportionate weights are also partitioned intoK blocks. Accordingly, the entries of P(k) will bede"ned on the basis of K blocks. This is slightlydi:erent from the weighting matrix P(k) of PNLMSwhere each diagonal entry is proportional to the cor-responding "lter coe>cient magnitude. To facilitatepartitioning of P(k) into K blocks, let us de"ne

i(k) =M−1∑j=0

|wj; i(k)|; i = 0; : : : ; K − 1;

where the i(k) represent the sum of coe>cient mag-nitudes for each partition of !(k).Following the same line of development as in

PNLMS [5], the entries of the diagonal weight-ing matrix P(k) are calculated at every iteration

Page 9: Generalized subband decomposition adaptive filters for sparse systems

K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103 1101

according to

l∞(k) = max{"p; 0(k); : : : ; K−1(k)}; (20a)

�i(k) = max{$l∞(k); i(k)};06 i6K − 1; (20b)

pi(k) =K�i(k)∑K−1j=0 �j(k)

; 06 i6K − 1; (20c)

where "p and $ are the GSD-PLMS parameters thate:ect small-signal regularization, and the pi(k) arethe diagonal block entries of P(k) de"ned by

P(k) =

p0IM×M 0

p1IM×M

. . .

0 pK−1IM×M

KM×KM

:

The parameter "p prevents the algorithm from mis-behaving when the GSD coe>cients !(k) are verysmall as at initialization, and $ prevents individual "l-ter coe>cients from freezing (i.e., being never adaptedagain) when their magnitude is much smaller thanthe largest coe>cient l∞ [5]. Typical values for thePNLMS parameters are "p = 0:01 and $ = 5=K . If$¿ 1, then we get pi(k) = 1, i.e., P(k) = I , andGSD-PLMS behaves like GSD-LMS.

7. Simulations

7.1. Non-uniqueness of GSD-LMS solution

We have seen in Section 5 that if L¡M , the GSDcoe>cients are not unique because of the presenceof zero eigenvalues in the autocorrelation matrix ofthe transformed signal, which is also referred to asthe lack of persistent excitation. The lack of persis-tent excitation is known to cause the coe>cient driftphenomenon in practical implementations of adaptive"lters [11]. The reason for coe>cient drifts is the ex-istence of a continuum of stable points on the adaptive"lter cost function topology caused by the zero eigen-values that accommodates all possible solutions of theadaptive "lter with the same mean squared error per-formance. Therefore, the "lter coe>cients are free tomove on this Eat hyperdimensional cost function sur-face without a:ecting the mean-squared error for the

0 10 20 30 40 50 60 70-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Sample

GS

D c

oeffi

cien

ts

Fig. 2. GSD-LMS solution for all zeros initialization.

"lter output. The movement of the coe>cients can betriggered by small perturbations due to, for example,quantization errors in "nite precision implementations.In this section, we demonstrate the non-uniqueness

of the GSD-LMS solution in an adaptive systemidenti"cation application. The non-uniqueness of theGSD-LMS solution is the reason for potential coef-"cient drifts in the adaptive "lter. Coe>cient driftstypically occur over long time periods, so the bestapproach to their simulation is through the con"rma-tion of the existence of in"nitely many solutions. Thesystem to be identi"ed is the active part of a networkecho channel with 64 samples. The GSD parame-ters were set to M = 8, L = 7, K = 9 with N = 64.To see the non-uniqueness of the GSD solution, weinitialized the GSD-LMS algorithm to two di:erentsettings, viz., all zeros and random values drawn froma zero-mean Gaussian random process. The solutionsto which GSD-LMS converged are shown in Figs. 2and 3. For both solutions, the mean-squared error is−66 dB. Note the signi"cant di:erence between thetwo GSD solutions, implying the non-uniqueness ofthe solution.

7.2. Sparse echo channel estimation

In this section we use input signals with speech-likespectra recommended by USASI (USA Standards In-stitute) and sparse network echo channels speci"edin ITU-T G.168 [3]. The convergence curves for the

Page 10: Generalized subband decomposition adaptive filters for sparse systems

1102 K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103

0 10 20 30 40 50 60 70-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Sample

GS

D c

oeffi

cien

ts

Fig. 3. GSD-LMS solution for random initialization.

0 50 100 150 200 250

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Net

wor

k E

cho

Pat

h

Sample

Fig. 4. Sparse network echo channel.

GSD-LMS and GSD-PLMS algorithms are obtainedfor two network echo channels. The DCT is used asthe unitary transform in view of its good performancefor speech signals.Figs. 4 and 5 show the network echo channels used.

In Fig. 4, the Eat delay is 128 samples and in Fig.5 it is 83 samples. The active regions are 64 and 96samples long in Figs. 4 and 5, respectively. Fig. 4 rep-resents a typical sparse echo channel for testing pur-poses, and Fig. 5 is a measured echo channel in theUS. For both GSD-LMS and GSD-PLMS, the GSDparameters were set to M = L= 8 and K = 32, which

0 50 100 150 200 250-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

Net

wor

k E

cho

Pat

h

Sample

Fig. 5. Measured network echo channel.

0 0.5 1 1.5 2 2.5 3 3.5

x 104

10-2

10-1

100

101

102

Mea

n S

quar

ed E

rror

Iteration

GSD- LMSGSD- PLMS

Fig. 6. Learning curves for GSD-LMS and GSD-PLMS for echochannel in Fig. 4.

corresponds toN=256. The signal-to-noise ratio at theecho channel output is 30 dB. The exponential win-dow forgetting factor for power estimation is �=0:99.The stepsize was set to �=0:0020 for GSD-LMS andto �=0:0025 for GSD-PLMS, which ensures the samemean squared error for both algorithms on conver-gence. The small-signal regularization parameters ofGSD-PLMS were $=5=K and "p=0:01 as prescribedin Section 6. The learning curves for GSD-LMS andGSD-PLMS are shown in Figs. 6 and 7 for the net-work echo channels in Figs. 4 and 5, respectively.

Page 11: Generalized subband decomposition adaptive filters for sparse systems

K. Do%ganc&ay / Signal Processing 83 (2003) 1093–1103 1103

0 0.5 1 1.5 2 2.5 3 3.5

x 104

10-2

10-1

100

101

102

Mea

n S

quar

ed E

rror

Iteration

GSD LMSGSD PLMS

Fig. 7. Learning curves for GSD-LMS and GSD-PLMS for echochannel in Fig. 5.

The faster convergence of the GSD-PLMS algorithmis clearly visible from Figs. 6 and 7.

8. Conclusion

The non-uniqueness of the GSD structure has beenshown to be undesirable in terms of keeping the spar-sity of a system in the transform domain. The uniquesolution is guaranteed if the transform size is set equalto the sparsity factor of the sub"lters. The uniquenessof the GSD solution ensures that no coe>cient driftsoccur during adaptation and that the sparsity is main-tained in the transform domain if the transform size ischosen su>ciently small.The GSD-PLMS algorithm was derived from the

solution of a constrained optimization problem. It wasshown through simulations that GSD-PLMS is capa-ble of improving the convergence speed of GSD-LMSfor sparse systems. It should be noted that GSD-PLMSis not the only adaptive sparse system identi"cationalgorithm that can be developed in the transform do-main based on Observation 1. Other types of adapta-tion algorithms that attempt to identify the active GSDcoe>cients can also be developed with the possibilityof further improvement in the convergence speed forsparse systems.

References

[1] S.D. Blunt, K.C. Ho, Novel sparse adaptive algorithm inthe Haar transform domain, in: Proceedings of the IEEEInternational Conference on Acoustics, Speech, and SignalProcessing, ICASSP 2000, Vol. 1, Istanbul, Turkey, June2000, pp. 452–455.

[2] Y.-F. Cheng, D.M. Etter, Analysis of an adaptive techniquefor modelling sparse systems, IEEE Trans. Acoust. SpeechSignal Process. 37 (2) (February 1989) 254–264.

[3] Digital Network Echo Cancellers, Recommendation ITU-TG.168, International Telecommunication Union, April2000.

[4] D.L. Duttweiler, Proportionate normalized least-mean-squaresadaptation in echo cancelers, IEEE Trans. Speech AudioProcess. 8 (5) (September 2000) 508–518.

[5] S.L. Gay, An e>cient, fast converging adaptive "lter fornetwork echo cancellation, in: Conference Record of AsilomarConference on Signals, Systems and Computers, 1998, Vol.1, Paci"c Grove, CA, November 1998, pp. 394–398.

[6] G.C. Goodwin, K.S. Sin, Adaptive Filtering, Prediction, andControl, Prentice-Hall, Englewood Cli:s, NJ, 1984.

[7] S. Haykin, Adaptive Filter Theory, 3rd Edition, Prentice-Hall,Englewood Cli:s, NJ, 1996.

[8] J. Homer, Detection guided NLMS estimation of sparselyparametrized channels, IEEE Trans. Circuits Systems II 47(12) (December 2000) 1437–1442.

[9] J. Homer, I.M.Y. Mareels, R.R. Bitmead, B. Wahlberg,F. Gustafsson, LMS estimation via structural detection, IEEETrans. Signal Process. 46 (10) (October 1998) 2651–2663.

[10] W.K. Jenkins, A.W. Hull, J.C. Strait, B.A. Schnaufer, X. Li,Advanced Concepts in Adaptive Signal Processing, Kluwer,Boston, 1996.

[11] E.A. Lee, D.G. Messerschmitt, Digital Communication, 2ndEdition, Kluwer Academic Publishers, Boston, MA, 1994.

[12] D.F. Marshall, W.K. Jenkins, J.J. Murphy, The use oforthogonal transforms for improving performance of adaptive"lters, IEEE Trans. Circuits Systems 36 (4) (April 1989)474–484.

[13] R.K. Martin, C.R. Johnson Jr., NSLMS: a proportional weightalgorithm for sparse adaptive "lters, in: Conference Recordof Asilomar Conference on Signals, Systems and Computers,2001, Vol. 2, November 2001, pp. 1530–1534.

[14] S.S. Narayan, A.M. Peterson, M.J. Narasimha, Transformdomain LMS algorithm, IEEE Trans. Acoust. Speech SignalProcess. 31 (3) (June 1983) 609–615.

[15] M.R. Petraglia, S.K. Mitra, Adaptive FIR "lter structure basedon the generalized subband decomposition of FIR "lters, IEEETrans. Circuits Systems II 40 (6) (June 1993) 354–362.

[16] B. Widrow, M. Lehr, 30 years of adaptive neural networks:perceptron, madaline, and backpropagation, Proc. IEEE 78(9) (September 1990) 1415–1442.

[17] B. Widrow, S.D. Stearns, Adaptive Signal Processing,Prentice-Hall, Englewood Cli:s, NJ, 1985.