46
Introduction of Weiner Filter R01943128 Darcy Tsai Weiner Filter

Weiner Filter

Embed Size (px)

DESCRIPTION

Weiner Filter details

Citation preview

Introduction of Weiner Filter

Introduction of Weiner FilterR01943128Darcy TsaiWeiner Filter

Darcy Tsai ()[email protected] Institute of Electronics EngineeringNation Taiwan University, Taipei, Taiwan, ROC

Abstract

Wiener theory, formulated by Norbert Wiener in 1940, forms the foundation of data-dependent linear least square error filters. Wiener filters play a central role in a wide range of applications such as linear prediction, echo cancellation, signal restoration, channel equalization and system identification. The coefficients of a Wiener filter are calculated to minimize the average squared distance between the filter output and a desired signal. In its basic form, the Wiener theory assumes that the signals are stationary processes. However, if the filter coefficients are periodically recalculated for every block of N signal samples then the filter adapts itself to the average characteristics of the signals within the blocks and becomes block-adaptive. A block-adaptive (or segment adaptive) filter can be used for signals such as speech and image that may be considered almost stationary over a relatively small block of samples. In this chapter, we study Wiener filter theory, and consider alternative methods of formulation of the Wiener filter problem. We consider the application of Wiener filters in channel equalization, time-delay estimation and additive noise reduction. A case study of the frequency response of a Wiener filter, for additive noise reduction, provides useful insight into the operation of the filter. We also deal with some implementation issues of Wiener filters.

KeywordsWeiner filter, Optimum linear filters, Minimum mean squared error (MMSE)

.Content

1Introduction32Linear Optimum Filtering42.1Problem Statement42.2Principle of Orthogonalaity62.3Minimum Mean Squared Error113Wiener-Hopf Filters113.1Wiener-Hopf Equations113.2Matrix Formulation of the Wiener-Holf Equations143.3Error Performance Surface153.4Numeral Example174Some Applications of Wiener Filters254.1Wiener Filter for Additive Noise Reduction254.2Wiener Channel Equalizer284.3Time-Alignment of Signals in Multichannel Systems295Implementation of Wiener Filters316Further Comments33

1 IntroductionWiener filters are a class of optimum linear filters which involve linear estimation of a desired signal sequence from another related sequence. In the statistical approach to the solution of the linear filtering problem, we assume the availability of certain statistical parameters (e.g. mean and correlation functions) of the useful signal and unwanted additive noise. The problem is to design a linear filter with the noisy data as input and the requirement of minimizing the effect of the noise at the filter output according to some statistical criterion. A useful approach to this filter-optimization problem is to minimize the mean-square value of the error signal that is defined as the difference between some desired response and the actual filter output. For stationary inputs, the resulting solution is commonly known as the Weiner filter. Its main purpose is to reduce the amount of noise present in a signal by comparison with an estimation of the desired noiseless signal.

2 Linear Optimum Filtering2.1 Problem StatementConsider the block diagram of Fig.1 built around a linear discrete-time filter. The filter input consists of a time series x(0), x(1), x(2), , and the filter is itself characterized by the impulse response w0, w1, w2, At some discrete time n, the filter produces an output denote by y(n). This output is used to provide an estimate of a desired response denoted by d(n). With the filter input and the desired response representing single realizations of respective stochastic processes, the estimation is accompanied by an error with statistical characteristics if its own. In particular, the estimation error e(n) as small as possible in some statistical sense. Two restriction s have so far been placed on the filter:

1. The filter is linear, which makes the mathematical analysis easy to handle.2. The filter operates in discrete time, which makes it possible for the filter to be implemented using digital hardware/software.

Fig. 1 Block diagram representation of the statistical filtering problem

The final details of the filter specification, however, depend on two other choices that have to be made:

1. Whether the impulse response of the filter has finite or infinite duration.2. The type of statistical criterion used for the optimization.

The choice of a finite-duration impulse response (FIR) or an infinite-duration impulse response (IIR) for the filter is dictated by practical considerations. The choice of a statistical criterion for optimizing the filter design is influenced bt mathematical tractability. These two issues are considered in turn.

For the initial developed includes that for FIR filters as a special case. However, for much of the material presented in this tutorial, we will confine our attention to the use if FIR filters. We do so for the following reason. An FIR filter is inherently stable, because its structure involves the use of forward paths only. In others words, the only mechanism for input-output interaction in the filter us via forward paths from the filter input to its output. Indeed, it is this form of signal transmission through the filter that limits its impulse response to a finite duration. On the other hand, an IIR filter involves both feedforward and feedback. The presence of feedback means that portions of the filter output and possibly other internal variables in the filter are fed back to the input. Consequently, unless it is properly designed, feedback in the filter can indeed make it unstable with the result that the filter oscillates; this kind of operation is clearly unacceptable when the requirement is that of filtering for which stability is a must. By itself, the stability problem in IIR filters us manageable in both theoretical and practical terms. However, when the filter us required to be adaptive, bringing with it stability problems of its own, the inclusion of the adaptivity combined with feedback that is inherently present in an IIR filter makes a difficult problem that much more difficult to handle. It is for this reason that we find that in the majority if applications requiring the use if adaptivity, the use if an FIR filter is preferred over IIR filter even through the latter is less demanding in computational requirements.

Turning next to the issue of what criterion to choose for statistical optimization, there are indeeded several criteria that suggest themselves. Specifically, we may consider optimizing the filter design by minimizing a cost function, or index of performance, selected form the following short list of possibilities:

1. Mean-square value if the estimation error2. Expectation of the absolute value of the estimation error3. Expectation of third or higher powers of the absolutely value of the estimation error

Option 1 has a clear advantage over the other two, because it leads to tractable mathematics. In particular, the choice of the mean-square error criterion results in a second order dependence for the cost function on the unknown coefficients in the impulse response of the filter. Moreover, the cost function has a distinct minimum that uniquely defines the optimum statistical design of the filter.

We may now summarize the essence of the filtering problem it making the following statement:

Design a linear discrete-time filter whose output y(n) provides an estimate of a desired signal response d(n), given a set of input samples x(0), x(1), x(2), , such that the mean-square value of the estimation error e(n), defined as the difference between the desired response d(n) and the actual response y(n), is minimized.

We may develop the mathematical solution to this statistical optimization problem by following two entirely different approaches that are complementary. One approach leads to the development of an important theorem commonly known as the principle of orthogonality. The other approach highlights the error-performance surface that describes the second-order dependence of the cost function one the filter coefficients. We will proceed by deriving the principle of orthogonality first, because the derivation is relatively simple and because the principle of orthogonality is highly insightful.

2.2 Principle of OrthogonalityConsider again the statistical filtering problem described in Fig1. We have a set of samples{x(n)} and desired{d(n)} coming from a jointly wide sense stationary (WSS) process with zero mean. Suppose now we want to find a linear estimate of d(n) based on the L-most recent samples of x(n), i.e.,

(1)

The introduction of a particular criterion to quantify how well d(n) is estimated by y(n) would influence how the coefficients wk will be computed. We propose to use the Mean Squared Error (MSE), which is defined by

(2)

where E[] is the expectation operator and e(n) is the estimation error. Then, the estimation problem can be seen as finding the vector w that minimizes the cost function JMSE(w). The solution to this problem is sometimes called the stochastic least squares solution. If we choose the MSE cost function (2), the optimal solution to the linear estimation problem can be presented as:

(3)Replacing (1) in (2), the latter can be expanded as

(4)

As this is a quadratic form, the optimal solution will be at the point where the cost function has zero gradient, i.e.,

(5)

or in other words, the partial derivative of JMSE with respect to each coefficient wk should be zero. Under this set of conditions the filter is said to be optimum in the mean-squared-error sense. Using (1) in (2), we can compute the gradient as:

(6)

Then, at the minimum, the condition that should hold is:

(7)or equivalently

(8)This is called the principle of orthogonality, and it implies that the optimal condition is achieved if and only if the error e(n) is decorrelated from the samples x(nk), k = 0,1,..., L1. Actually, the error will also be decorrelated from the estimate y(n) since:

(9)

Fig.2

yopt(n)

We may thus state the corollary to the principle of orthogonality as follows:

When the filter operates in its optimum condition, the estimate of the desired response defined by the filter output yopt(n), and the corresponding estimation error, emin(n), are orthogonal to each other.

Equation (9) offers an interesting geometric interpretation of the conditions that exist at the output of the optimum filter, as illustrated in Fig.2 for case L = 2. In this figure, the desired response, the filter output, and the corresponding estimation error are represented by vectors labeled d, yopt, and emin, respectively. We see that for the optimum filter the vector representing the estimation error is normal (i.e. perpendicular) to the vector representing the filter output. It should, however, be emphasized that the situation depicted in Fig.2 is merely an analogy, where random variables and expectations are replaced with vectors and vector inner products, respectively. Also for obvious reasons the geometry depicted in this figure may be viewed as a Statisticians Pythagorean Theorem.

2.3 Minimum Mean-Squared ErrorWhen the linear discrete-time filter in Fig.1 operates in its optimum condition, it takes on the following special form:

(10)

Rearranging the terms, we have:

(11)

Let JMSE denotes the MMSE, defined by:

(12)

Hence, evaluating the mean-square values of both sides of (11), and applying to it the corollary to the principle of orthogonality described by (9), we get:

(13)

where is the variance of the desired response, and is the variance of the estimate yopt ; both of these random variables are assumed to be of zero mean. Solving (13) for MMSE, we get:

(14)

This relation shows that for the optimum filter, the MMSE equals the difference between the variance of the desired response and the variance of the estimate that the filter products at its output.

It is convenient to normalize the expression in (14) in such a way that the minimum value if the mean-squared error always lies between zero and one. We may do this by dividing both sides of (14) by , obtaining

(15)

Clearly, this is possible because is never zero, expect in the trivial case of a desired response d(n) that is zero for all n. Let

(16)

The quantity is called the normalized mean-squared error, in terms of which we may rewrite (15) in the form:

(16)

We note that the ratio can never be negative, and the ratio is always positive. We therefore have

(17)

If is zero, the optimum filter operates perfectly in the sense that there is complete agreement between the estimate yopt(n) at the filter output and the desired response d(n). On the other hand, if is unity, there is no agreement whatsoever between these two quantities; this corresponds to the worst possible situation.

3 Wiener-Hopf Filter3.1 Wiener-Hopf EquationConsider a signal x(n) as the input to a finite impulse response (FIR) filter of length L, as shown in Fig.3, wT = [wT,0, wT,1, , wT,L-1]T. This filtering operation generates an output:

(18)

with X(n) = [x(n), x(n-1), , x(n-L+1)]T. As the output of the filter is observed, it can be corrupted by an additive measurement noise v(n), leading to a linear regression model for the observed output

(19)

Fig. 3 FIR FilterIt should be noticed that this linear regression model can also be used even if the input-output relation of the given data pairs [x(n), d(n)] is nonlinear, with wT being a linear approximation to the actual relation between them. In that case, in v(n) there would be a component associated to the additive noise perturbations, but also another one representing, for example, modeling errors.

In the context of (19), we can look at wT as the quantity to be estimated by a linear filter w RL, with (1) giving the output of this filter. This output can still be seen as an estimate of the reference signal d(n) or the systems output y(n). Therefore, the problem of optimal filtering is analogous to the one of linear estimation. When JMSE is the cost function to be optimized, the orthogonality principle (7) holds, which can be put as:

(20)

From (20) we can conclude that given the signals x(n) and d(n), we can always assume that d(n) was generated by the linear regression model (19). To do this, the system wT would be equal to the optimal filter wopt, while v(n) would be associated to the residual error emin(n), which will be uncorrelated to the input x(n).

It should be noticed that (8) is not just a condition for the cost function to reach its minimum, but also a mean for testing whether a linear filter is operating in the optimal condition. Here, the principle of orthogonality illustrated in Fig. 2 can be interpreted as follows: at time n the input vector X(n) = [x(n), x(n-1)]T will pass through the optimal filter wopt = [wopt,0 , wopt,1]T to generate the output yopt(n). Given d(n), yopt(n) is the only element in the space spanned by x(n) that leads to an error e(n) that is orthogonal to x(n), x(n 1), and yopt(n).

Now we focus on the computation of the optimal solution. From (20), we have:

(21)

We introduce the following definitions

(22)

for the input autocorrelation matrix and the cross correlation vector, respectively.

The two expectations in (21) may be interpreted as follows:1. The expectation

is equal to the autocorrelation function of the filter input for a lag of i-k. We may thus express this expectation as

(23)

2. The expectation

is equal to the cross-correlation between the filter input x(n-k) and the desired response d(n) for a lag of k. We may thus express this second expectation as

(24)

Accordingly, using the definitions of (23)(24) in (21), we get an infinitely large system of equations as the necessary and sufficient condition for the optimality of the filter:

(25)

The system of equations (25) defines the optimum filter coefficients, in the most general setting, in terms of two correlation functions: the autocorrelation function of the filter input, and the cross-correlation between the filter input and the desired response. These equations are called WienerHopf equations.

3.2 Matrix Formulation of the Wiener-Holf Equations Let RX denote the L-by-L correlation matrix of the tap inputs x(n), x(n-1), , x(n-L+1) in the FIR filter of the Fig.3. In expanded form, we have

Correspondingly, let denote the L-by-1 cross0correlation vector between the tap inputs of the filter and the desired response d(n). In expanded form, we have

Note that as the joint process is WSS, the matrix Rx is symmetric, semi-positive definite and Toeplitz. Using these definitions, equation (25) can be put as:

(26)

This is the compact matrix form of a set of L equations known as WienerHopf equations and provides a way for computing the optimal filter (in MSE sense) based on some statistical properties of the input and reference processes. Under the assumption on the positive definiteness of Rx (so that it will be nonsingular), the solution to (26) is:

(27)

which is known as the Wiener filter. An alternative way to find it is the following.

3.3 Error Performance Surface

Using the definitions (21) into (4) results in

(28)

In addition, it can be easily shown that the following factorization holds:

(29)

Replacing (2.18) in (2.17) leads to

(30)

Using the fact that Rx is positive definite (and therefore, so is its inverse), it turns out that the cost function reaches its minimum when the filter takes the form of (30), i.e., the Wiener filter. The minimum MSE value (MMSE) on the surface (30) is:

(31)

We could have also arrived to this result by noticing that and using the orthogonality principle as in (2.9). Therefore, the MMSE is given by the difference between the variance of the reference signal d(n) and the variance of its optimal estimate yopt(n).

It should be noticed that if the signals x(n) and d(n) are orthogonal (rxd = 0), the optimal filter will be the null vector and . This is reasonable since nothing can be done with the filter w if the input signal carries no information about the reference signal (as they are orthogonal). Actually, (28) shows that in this case, if any of the filter coefficients is nonzero, the MSE would be increased by the term , so it would not be optimal. On the other hand, if the reference signal is generated by passing the input signal through a system wT as in (19), with the noise v(n) being uncorrelated from the input x(n), the optimal filter will be:

(32)

This means that the Wiener solution will be able to identify the system wT with a resulting error given by v(n). Therefore, in this case .

Finally, it should be noticed that the autocorrelation matrix admits the eigen-decomposition:

(33)

with being a diagonal matrix determined by the eigenvalues 0, 1, . . . , L1 of Rx, and Q a (unitary) matrix that has the associated eigenvectors q0, q1, . . . , qL1 as its columns. Lets define the misalignment vector (or weight error vector)

(34)and its transformed version

(35)Using (27)(30)(31)(33)(34)(35)

(36)

This is called the canonical form of the quadratic form JMSE(w) and it contains no cross-product terms. Since the eigenvalues are non-negative, it is clear that the surface describes an elliptic hyperparaboloid, with the eigenvectors being the principal axes of the hyperellipses of constant MSE value.

3.4 Numeral ExampleTo illustrate the filtering theory developed above, we consider the example depicted in Fig.4. The desired response d(n) is modeled as an AR (Auto-Regressive) process of order 1; that is, it may be produced by applying a white-noise process v1(n) of zero mean and variance to the input of an all-pole filter of order 1, whose transfer function equals:

(37)

The process d(n) is applied to a communication channel modeled by the all pole transfer function

(38)

The channel output x(n) is corrupted by an additive white-noise process v2(n) of zero mean and variance , so a sample of the received signal u(n) equals

(39)

The white-noise processes v1(n) and v2(n) are uncorrelated. It is also assumed that d(n) and u(n), and therefore v1(n) and v2(n), are all real valued.

Fig. 4 (a) Autoregressive model of desired response d(n) (b) model of noisy communication channel

The requirement is to specify a Wiener filter consisting of a transversal filter with two taps, which operates on the received signal u(n) so as to produce an estimate of the desired response that is optimum in the mean-squared sense. Statistical Characterization of the Desired Response & Received Signal

We begin the analysis by considering the difference equations that characterize the various processes desired by the models of Fig.4. First, the generation of the desired response d(n) is governed by the first-order difference equation

(40)

Where a1 = 0.8458. The variance if the process d(n) equals

(41)

The process d(n) acts as input to the channel. Hence, form Fig.4, we find that the channel output x(n) is related to the channel input d(n) by the first-order difference equation

(42)

where b1 = -0.9458. We also observe from the two parts of Fig.4 that the channel output x(n) may be generated by applying the white noise process v1(n) to a second-order all=pole filter whose transfer function equals

(43)

Accordingly, x(n) is a second-order AR process described by the difference equation

(44)

where a1 = -0.1 and a2= -0.8. Note that both AR process d(n) and x(n) are WSS.

To characterize the Wiener filter, we need to solve the Wiener-Holf equations (26). This set of equations requires knowledge of two quantities: (1) the correlation matrix R pertaining to received signal u(n), and (2) the cross-correlation vector between u(n) and the desired response d(n). In our example, R is a 2-by-2matrix and is a 2-by-1 vector, since the FIR filter used to implement the Wiener filter is assumed to have two taps.

The received signal u(n) consists if the channel output x(n) plus the additive white noise v2(n). Since the process x(n) and v2(n) are uncorrelated, it follows that the correlation matrix R equals the correlation matrix of x(n) plus the correlation matrix if v2(n). That is,

(45)

For the correlation matrix RX, we write

Where rx(0) and rx(1) are the autocorrelation functions of the received signal x(n)for lags of 1 and 1,repectively. We have:

and

Hence,

(46)

Next we observe that since v2(n) is a white-noise process of zero mean and variance , the 2-by-2 correlation matrix R2 of this process results

(47)

Thus, substituting (46) (47) in (45) we find that the 2-by-2 correlation matrix of the received signal x(n) equals

(48)

For the 2-by-1 cross-correlation vector, we write

where p(0) and p(-1) are the cross-correlation functions between d(n) and u(n) for lags of 0 and -1, respectively. Since these two processes are real valued, we have

(49)

Substituting (42) in (49), and recognizing that the channel output x(n) is uncorrelated with the white-noise process v2(n), we get

(50) Putting b1= -0.9458 and using the element values fit the correlation matrix R, given in (46), we obtain

Hence,

(51)

Error-Performance Surface

The dependence of the mean-squared error on the 2-by-1tap-weight vector w is defined by (28). Hence, substituting (41) (48) and (51) into (28), we get

Using a three-dimensional computer plot, the mean-squared error is plotted versus the tap weights and . The result is shown in Fig.5.

Fig.6 shows contour plots of the tap weight versus for varying valuesof the mean-squared error J. We see that the locus of versus for a fixed J is in the form of an ellipse. The elliptical locus shrinks in size as the mean-squared error J approaches the minimum value Jmin. For J = Jmin, the locus reduces to a point with coordinates wo0 and wo1.

Wiener Filter

The 2-by-1 optimum tap-weight vector wo of the Wiener filter is defined by (27). In particular, it consists of the inverse matrix R-1 multiplied by the cross-correlation vector. Inverting the correlation matrix R of (48), we get

Fig. 5 Error performance surface of the two-tap FIR filter described in the numerical example

Fig. 6 Contour plots of the error-performance surface depicted in Fig.5

(52)

Hence, substituting (51) (52) into (27), we get the desired result:

(53)

Minimum Mean-Squared Error

To evaluate the minimum value of the mean-squared error, Jmin, which results from the use of the optimum tap-weight vector wo, we use (31). Hence, substituting (41) (51) and (53) into (31), we get

(54)

The point represented jointly by the optimum tap-weight vector wo of (53) and the minimum mean-squared error of (54) defines the bottom of the error-performance surface in Fig.5, or the center of the contour plots in Fig.6.

Canonical Error-Performance Surface

The characteristic equation of the 2-by-2correlation matrix R of (48) is

The two eigenvalues if the correlation matrix R are therefore

The canonical error-performance surface is therefore defined by

(55)

The locus of u2 versus u1, as defined in (55), traces an ellipse for a fixed value of JMSE - JMMESE. In particular, the ellipse has a minor axis of along the u1 coordinate and a major axis of along the u2 coordinate; this assumes that, which is how they are related.

4 Some Applications of Wiener FiltersIn this section, we consider some applications of the Wiener filter in reducing broadband additive noise, in time-alignment of signals in multichannel or multi-sensor systems, and in channel equalization.4.1 Wiener Filter for Additive Noise ReductionConsider a signal x(m) observed in a broadband additive noise n(m)., and model as:

(56)

Assuming that the signal and the noise are uncorrelated, it follows that the autocorrelation matrix of the noisy signal is the sum of the autocorrelation matrix of the signal x(m) and the noise n(m):

(57)

and we can also write

(58)

where Ryy, Rxx and Rnn are the autocorrelation matrices of the noisy signal, the noise-free signal and the noise respectively, and rxy is the cross-correlation vector of the noisy signal and the noise-free signal. Substitution of (57) and (58) in the Wiener filter, yields

(59)

(59) is the optimal linear filter for the removal of additive noise. In the following, a study of the frequency response of the Wiener filter provides useful insight into the operation of the Wiener filter. In the frequency domain, the noisy signal Y(f) is given by

(60)

where X(f) and N(f) are the signal and noise spectra. For a signal observed in additive random noise, the frequency-domain Wiener filter is obtained as

(61)

where PXX(f) and PNN(f) are the signal and noise power spectra. Dividing the numerator and the denominator of Equation (61) by the noise power spectra PNN(f) and substituting the variable SNR(f)=PXX(f)/PNN(f) yields

(62)

where SNR is a signal-to-noise ratio measure. Note that the variable, SNR(f) is expressed in terms of the power-spectral ratio, and not in the more usual terms of log power ratio. Therefore SNR(f)=0 corresponds to dB.

From Fig.7, the following interpretation of the Wiener filter frequency response W(f) in terms of the signal-to-noise ratio can be deduced. For additive noise, the Wiener filter frequency response is a real positive number in the range 0 W(f) 1. Now consider the two limiting cases of (a) a noise-free signal SNR(f) = and (b) an extremely noisy signal SNR(f)=0. At very high SNR, W (f)1, and the filter applies little or no attenuation to the noise-free frequency component. At the other extreme, when SNR(f)=0, W(f)=0. Therefore, for additive noise, the Wiener filter attenuates each frequency component in proportion to an estimate of the signal to noise ratio. Figure 6.4 shows the variation of the Wiener filter response W(f), with the signal-to-noise ratio SNR(f).

Fig. 7 Variation of the gain of Wiener filter frequency response with SNR

An alternative illustration of the variations of the Wiener filter frequency response with SNR(f) is shown in Fig8. It illustrates the similarity between the Wiener filter frequency response and the signal spectrum for the case of an additive white noise disturbance. Note that at a spectral peak of the signal spectrum, where the SNR(f) is relatively high, the Wiener filter frequency response is also high, and the filter applies little attenuation. At a signal trough, the signal-to-noise ratio is low, and so is the Wiener filter response. Hence, for additive white noise, the Wiener filter response broadly follows the signal spectrum.

Fig. 8 Illustration of the variation of Wiener frequency response with signal spectrum for additive white noise. The Wiener filter response broadly follow the signal spectrum.

4.2 Wiener Channel EqualizerCommunication channel distortions may be modelled by a combination of a linear filter and an additive random noise source as shown in Figure 9. The input/output signals of a linear time invariant channel can be modelled as

(63)

where x(m) and y(m) are the transmitted and received signals, [hk] is the impulse response of a linear filter model of the channel, and n(m) models the channel noise. In the frequency domain (63) becomes

(64)

where X(f), Y(f), H(f) and N(f) are the signal, noisy signal, channel and noise spectra respectively. To remove the channel distortions, the receiver is followed by an equalizer. The equalizer input is the distorted channel output, and the desired signal is the channel input. It is easy to show that the Wiener equalizer in the frequency domain is given by

(65)

where it is assumed that the channel noise and the signal are uncorrelated. In the absence of channel noise, PNN(f)=0, and the Wiener filter is simply the inverse of the channel filter model W(f)=H1(f).

Fig. 9 Illustration of a channel model followed by an equalizer

4.3 Time-Alignment of Signals in Multichannel SystemsIn multichannel/multisensor signal processing there are a number of noisy and distorted versions of a signal x(m), and the objective is to use all the observations in estimating x(m), as illustrated in Fig.10, where the phase and frequency characteristics of each channel is modelled by a linear filter. As a simple example, consider the problem of time-alignment of two noisy records of a signal given as

(66)

(67)

where y1(m) and y2(m) are the noisy observations from channels 1 and 2, n1(m) and n2(m) are uncorrelated noise in each channel, D is the time delay of arrival of the two signals, and A is an amplitude scaling factor. Now assume that y1(m) is used as the input to a Wiener filter and that, in the absence of the signal x(m), y2(m) is used as the desired signal. The error signal is given by

(68)

The Wiener filter strives to minimize the terms shown inside the square brackets in (68). Using the Wiener-Holf equation, we have

(69)

where rxx(D)=E [x(PD)x(m)]. The frequency-domain equivalent of (68) can be derived as

(70)

Note that in the absence of noise, the Wiener filter becomes a pure phase (or a pure delay) filter with a flat magnitude response.

Fig. 10 Configuration of a system for estimation of frequency Wiener filter5 Implementation of Wiener Filters

Fig. 11 A filter-bank implementation of a Wiener filter

The implementation of a Wiener filter for additive noise reduction, using Equations (59)(61), requires the autocorrelation functions, or equivalently the power spectra, of the signal and noise. The noise power spectrum can be obtained from the signal-inactive, noise-only, periods. The assumption is that the noise is quasi-stationary, and that its power spectra remains relatively stationary between the update periods. This is a reasonable assumption for many noisy environments such as the noise inside a car emanating from the engine, aircraft noise, office noise from computer machines, etc. The main practical problem in the implementation of a Wiener filter is that the desired signal is often observed in noise, and that the autocorrelation or power spectra of the desired signal are not readily available. Fig.10 illustrates the block-diagram configuration of a system for implementation of a Wiener filter for additive noise reduction. An estimate of the desired signal power spectra is obtained by subtracting an estimate of the noise spectra from that of the noisy signal. A filter bank implementation of the Wiener filter is shown in Fig.11 where the incoming signal is divided into N bands of frequencies. A first-order integrator, placed at the output of each band-pass filter, gives an estimate of the power spectra of the noisy signal. The power spectrum of the original signal is obtained by subtracting an estimate of the noise power spectrum from the noisy signal. In a Bayesian implementation of the Wiener filter, prior models of speech and noise, such as hidden Markov models, are used to obtain the power spectra of speech and noise required for calculation of the filter coefficients.

The Choice of Wiener Filter Order

The choice of Wiener filter order affects:(a) the ability of the filter to remove distortions and reduce the noise;(b) the computational complexity of the filter; and(c) the numerical stability of the of the Wiener solution

The choice of the filter length also depends on the application and the method of implementation of the Wiener filter. For example, in a filter-bank implementation of the Wiener filter for additive noise reduction, the number of filter coefficients is equal to the number of filter banks, and typically the number of filter banks is between 16 to 64. On the other hand for many applications, a direct implementation of the time-domain Wiener filter requires a larger filter length say between 64 and 256 taps.

A reduction in the required length of a time-domain Wiener filter can be achieved by dividing the time domain signal into N sub-band signals. Each sub-band signal can then be decimated by a factor of N. The decimation results in a reduction, by a factor of N, in the required length of each sub-band Wiener filter.

6 Further Comments

The MSE defined in (2) uses the linear estimator y(n) defined in (1). If we relax the linear constraint on the estimator and look for a function of the input, i.e., y(n) = g(x(n)), the optimal estimator in mean square sense is given by the conditional expectation E[d(n)|x(n)]. Its calculation requires knowledge of the joint distribution between d(n) and x(n), and in general, it is a nonlinear function of x(n) (unless certain symmetry conditions on the joint distribution are fulfilled, as it is the case for Gaussian distributions). Moreover, once calculated it might be very hard to implement it. For all these reasons, linear estimators are usually preferred (which as we have seen, depend only on second order statistics).

On a historical note, Norbert Wiener solved a continuous-time prediction problem under causality constraints by means of an elegant technique now known as the Wiener-Hopf factorization technique. This is a much more complicated problem than the one presented in (3). Later, Norman Levinson formulated the Wiener filter in discrete time.

It should be noticed that the orthogonality principle used to derive the Wiener filter does not apply to FIR filters only; it can be applied to IIR (infinite impulse response) filtering, and even noncausal filtering. For the general case, the output of the noncausal filter can be put as:

(71)

Then, minimizing the mean square error leads to the Wiener-Hopf equations

(72)

which can be solved using Z-transform methods. In addition, a general expression for the minimum mean square error is

(73)

From this general case, we can derive the FIR filter studied before (index i in the summation and lag k in (72) go from 0 to L 1) and the causal IIR filter (index I in the summation and lag k in (72) go from 0 to ). Finally we would like to comment on the stationary of the processes. We assume the input and reference processes were WSS. If this were not the case, the statistics would be time-dependent. However, we could still find the Wiener filter at each time n as the one that makes the estimation error orthogonal to the input, i.e., the principle of orthogonality still holds. A less costly alternative would be to recalculate the filter for every block of N signal samples. However, nearly two decades after Wieners work, Rudolf Kalman developed the Kalman filter, which is the optimum mean square linear filter for non-stationary processes (evolving under a certain state space model) and stationary ones (converging in steady state to the Wieners solution).

References1. A.H. Sayed, Adaptive Filters (John Wiley & Sons, Hoboken, 2008)2. S. Haykin, Adaptive Filter Theory, 4th edn. (Prentice-Hall, Upper Saddle River, 2002)3. G.H. Golub, C.F. van Loan, Matrix Computations (The John Hopkins University Press, Baltimore, 1996)4. B.D.O. Anderson, J.B. Moore, Optimal Filtering (Prentice-Hall, Englewood Cliffs, 1979)5. T. Kailath, A.H. Sayed, B. Hassibi, Linear estimation (Prentice-Hall, Upper Saddle River, 2000)

1

Linear Discrete-Time Filter

w0, w1, w2,

x(0), x(1), x(2),

Input

Output

y(n)

Desired Signal

d(n)

e(n)

Estimation Error

+

-

Given input samples and filter coefficients

Conditions at time n

+

d(n-1)

d(n)

+

Z-1

Z-1

0.9458

0.8458

d(n)

v1(n)

+

-

x(n)

+

x(n-1)

u(n)

v2(n)

(a)

(b)

Z-1

X(n)

d(n)

-

+

FIR Filter

w0(n)

w1(n)

wL-1(n)

y(n)

e(n)

Z-1

w2(n)