49
Bayesian Estimation Bayesian estimators dier from all classical estimators studied so far in that they consider the parameters as random variables instead of unknown constants. As such, the parameters also have a PDF, which needs to be taken into account when seeking for an estimator. The PDF of the parameters can be used for incorporating any prior knowledge we may have about its value.

Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Embed Size (px)

Citation preview

Page 1: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Bayesian Estimation

• Bayesian estimators differ from all classical estimators studiedso far in that they consider the parameters as randomvariables instead of unknown constants.

• As such, the parameters also have a PDF, which needs tobe taken into account when seeking for an estimator.

• The PDF of the parameters can be used for incorporatingany prior knowledge we may have about its value.

Page 2: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Bayesian Estimation

• For example, we might know that the normalizedfrequency f0 of an observed sinusoid cannot be greaterthan 0.1. This is ensured by choosing

p(f0) =

{10, if 0 6 f0 6 0.10, otherwise

as the prior PDF in the Bayesian framework.• Usually differentiable PDF’s are easier, and we could

approximate the uniform PDF with, e.g., the Rayleigh PDF.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

Normalized frequency f0

Prio

r

Uniform density

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

Normalized frequency f0

Prio

r

Rayleigh density with σ = 0.035

Page 3: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Prior and Posterior estimates

• One of the key properties of Bayesian approach is that itcan be used also for small data records, and the estimatecan be improved sequentially as new data arrives.

• For example, consider tossing a coin and estimating theprobability of a head, µ.

• As we saw earlier, the ML estimate is the number ofobserved heads divided by total number of tosses:µ = #heads

#tosses .• However, if we can not afford to make more than, say,

three experiments, we may end up seeing three heads andno tails. Thus, we are forced to infer that µML = 1, the coinlands always as a head.

Page 4: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Prior and Posterior estimates

• The Bayesian approach can circumvent this problem,because the prior regularizes the likelihood and avoidsoverfitting to the small amount of data.

• The pictures below illustrate this. The one on the top is thelikelihood function

p(x | µ) = µ#heads(1 − µ)#tails

with #heads = 3 and #tails = 0. The maximum of thefunction is at unity.

• The second curve is the prior density p(µ) of our choice. Itwas selected to reflect the fact that we assume that the coinis probably quite fair.

Page 5: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Prior and Posterior estimates

• The third curve is the posterior density p(µ | x) afterobserving the samples, which can be evaluated using theBayes formula

p(µ | x) =p(x | µ) · p(µ)

p(x)=

likelihood · priorp(x)

• Thus, the third curve is the product of the first two (withnormalization), and one Bayesian alternative is to use themaximum as the estimate.

Page 6: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Prior and Posterior estimates

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

µ

p( x

|µ)

Likelihood function after three tosses resulting in a head

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.025

0.05

µ

p(µ)

Prior density before observing any data

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

µ

p(µ|

x)

Posterior density after observing 3 heads

Page 7: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Cost Functions

• Bayesian estimators are defined by a minimizationproblem

θ = arg minθ

∫ ∫C(θ− θ)p(x, θ)dxdθ

which seeks for the value of θ that minimizes the averagecost.

Page 8: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Cost Functions

• The cost function C(x) is typically one of the following

1. Quadratic: C(x) = x2

2. Absolute: C(x) = |x|

3. Hit-or-miss: C(x) =

{0, |x| < δ

1, |x| > δ

• Additional cost functions include Huber’s robust loss andε-insensitive loss.

Page 9: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Cost Functions

• These three cost functions are favoured, because we canfind the minimum cost solution in closed form. We willintroduce the solutions next.

• Functions 1 and 3 are slightly easier to use than 2. Thus,we’ll concentrate on those.

• Regardless of the cost function, the above double integralcan be evaluated and minimized using the rule for jointprobabilities:

p(x, θ) = p(θ | x)p(x).

Page 10: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Cost Functions

• This results in∫ ∫C(θ−θ)p(θ | x)p(x)dxdθ =

∫ (∫C(θ− θ)p(θ | x)dθ

)︸ ︷︷ ︸

(∗)

p(x)dx

• Because p(x) is always nonnegative, it suffices to minimizethe multiplier inside the brackets, (∗)1:

θ = arg minθ

∫C(θ− θ)p(θ | x)dθ

1Note, that there’s a slight shift in the paradigm. The double integral results in the theoretical estimate thatrequires the knowledge of p(x). When minimizing only the inner integral, we get the optimum for a particularrealization, not all possible realizations.

Page 11: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

1. Quadratic Cost Solution (or the MMSEestimator)

• If we select the quadratic cost, then the Bayesian estimatoris defined by

arg minθ

∫(θ− θ)2 p(θ | x)dθ

• Simple differentiation gives:

∂θ

∫(θ− θ)2 p(θ | x)dθ =

∫∂

∂θ

[(θ− θ)2 p(θ | x)

]dθ

=

∫−2(θ− θ)p(θ | x)dθ

Page 12: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

1. Quadratic Cost Solution (or the MMSEestimator)

• Setting this equal to zero gives∫−2(θ− θ)p(θ | x)dθ = 0

⇔∫

2θ p(θ | x)dθ =

∫2θp(θ | x)dθ

⇔ θ

∫p(θ | x)dθ︸ ︷︷ ︸

=1

=

∫θp(θ | x)dθ

⇔ θ =

∫θp(θ | x)dθ

Page 13: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

1. Quadratic Cost Solution (or the MMSEestimator)

• Thus, we have the minimum:

θMMSE =

∫θp(θ | x)dθ = E(θ | x),

i.e., the mean of posterior PDF, p(θ | x).2

• This is called the minimum mean square error estimator(MMSE estimator), because it minimizes the averagesquared error.

2Prior PDF, p(θ), refers to the parameter distribution before any observations are made. Posterior PDF, p(θ | x),refers to the parameter distribution after observing the data.

Page 14: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

2. Absolute Cost Solution

• If we choose the absolute value as the cost function, wehave to minimize

arg minθ

∫ ∣∣θ− θ∣∣ p(θ | x)dθ

• This can be shown to be equivalent to the followingcondition ∫ θ

−∞ p(θ | x)dθ =

∫∞θ

p(θ | x)dθ

Page 15: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

2. Absolute Cost Solution

• In other words, the estimate is the value which divides theprobability mass into equal proportions:

∫ θ−∞ p(θ | x)dθ =

12

• Thus, we have arrived at the definition of the median of theposteriori PDF.

Page 16: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

3. Hit-or-miss Cost Solution (or the MAPestimator)

• For the hit-or-miss case, we also need to minimize theinner integral:

θ = arg minθ

∫C(θ− θ)p(θ | x)dθ

with

C(x) =

{0, |x| < δ

1, |x| > δ

Page 17: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

3. Hit-or-miss Cost Solution (or the MAPestimator)

• The integral becomes∫C(θ−θ)p(θ | x)dθ =

∫ θ−δ−∞ 1·p(θ | x)dθ+

∫∞θ+δ

1·p(θ | x)dθ

or in a simplified form∫C(θ− θ)p(θ | x)dθ = 1 −

∫ θ+δθ−δ

1 · p(θ | x)dθ

Page 18: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

3. Hit-or-miss Cost Solution (or the MAPestimator)

• This is minimized by maximizing∫ θ+δθ−δ

p(θ | x)dθ

• For small δ and smooth p(θ | x) the maximum of theintegral occurs at the maximum of p(θ | x).

• Therefore, the estimator is the mode (the highest value) ofthe posteriori PDF. Thus the name Maximum a Posteriori(MAP) estimator.

Page 19: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

3. Hit-or-miss Cost Solution (or the MAPestimator)

• Note, that the MAP estimator

θMAP = arg maxθ

p(θ | x)

is calculated as (using the Bayes’ rule):

θMAP = arg maxθ

p(x | θ)p(θ)

p(x)

Page 20: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

3. Hit-or-miss Cost Solution (or the MAPestimator)

• Since p(x) does not depend on theta, it is equivalent tomaximize only the numerator:

θMAP = arg maxθ

p(x | θ)p(θ)

• Incidentally, this is close to the ML estimator:

θML = arg maxθ

p(x | θ)

The only difference is the inclusion of the prior PDF.

Page 21: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Summary

• To summarize, the three most widely used Bayesianestimators are

1 The MMSE, θMMSE = E(θ | x)

2 The Median, or θwith∫θ−∞ p(θ | x)dθ = 1

2 .3 The MAP, θMAP = arg maxθ p(x | θ)p(θ)

Page 22: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example

• Consider the case of tossing a coin for three times resultingin three heads.

• In the example, we used the Gaussian prior

p(µ) =1√

2πσ2exp

(−

12σ2 (µ− 0.5)2

).

• Now the µMAP becomes

µMAP = arg maxµ

p(x | µ)p(µ)

= arg maxµ

[µ#heads(1 − µ)#tails 1√

2πσ2exp

(−

12σ2 (µ− 0.5)2

)]

Page 23: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example

• Let’s simplify the arithmetic by setting # heads = 3 and #tails = 0:

µMAP = arg maxµ

[µ3 1√

2πσ2exp

(−

12σ2 (µ− 0.5)2

)]• Equivalently, we can maximize it’s logarithm:

arg maxµ

[3 lnµ− ln

√2πσ2 −

12σ2 (µ− 0.5)2

]

Page 24: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example

• Now,

∂µln [p(x|µ)p(µ)] =

3µ−

(µ− 0.5)σ2 = 0,

whenµ2 − 0.5µ− 3σ2 = 0.

This happens when

µ =0.5±

√0.25 − 4 · 1 · (−3σ2)

2= 0.25±

√0.25 + 12σ2

2.

Page 25: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example

• If we substitute the value used in the example, σ = 0.1,

µMAP = 0.25 +

√0.372

≈ 0.554.

• Thus, we have found the analytical solution of themaximum of the curve in slide 5.

Page 26: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Vector Parameter Case for MMSE

• In vector parameter case, the MMSE estimator is

θMMSE = E(θ | x)

or more explicitly

θMMSE =

∫θ1p(θ | x)dθ∫θ2p(θ | x)dθ

...∫θpp(θ | x)dθ

Page 27: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Vector Parameter Case for MMSE

• In the linear model case, there exists a straightforwardsolution:If the observed data can be modeled as

x = Hθ+ w,

where θ ∼ N(µθ, Cθ) and w ∼ N(0, Cw), then

E(θ | x) = µθ + CθHT (HCθHT + Cw)−1(x − Hµθ)

Page 28: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Vector Parameter Case for MMSE

• It is possible to derive an alternative form resembling theLS estimator (exercise):

E(θ | x) = µθ + (C−1θ + HTC−1

w H)−1HTC−1w (x − Hµθ).

• Note that this becomes the LS estimator if µθ = 0 andCθ = I and Cw = σ2

wI.

Page 29: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Vector Parameter Case for the MAP

• The MAP estimator can also be extended to vectorparameters:

θMAP = arg maxθ

p(θ | x)

or, using the Bayes’ rule,

θMAP = arg maxθ

p(x | θ)p(θ)

• Note, that in general this is different from p scalar MAP’s.Scalar MAP would maximize for each parameter θiindividually, but the vector MAP seeks for the globalmaximum of the vector space.

Page 30: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• Consider the data model

x[n] = a cos 2πf0n+b sin 2πf0n+w[n], n = 0, 1, . . . ,N−1

or in vector formx = Hθ+ w,

where

H =

1 0

cos 2πf0 sin 2πf0cos 4πf0 sin 4πf0

...cos(2(N− 1)πf0) sin(2(N− 1)πf0)

and θ =

(ab

)

Page 31: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• We depart from the classical model by assuming that a andb are random variables with prior PDF θ ∼ N(0,σ2

θI). Alsow is assumed Gaussian (N(0,σ2)) and independent of θ.

• Using the second version of the formula for the linearmodel (on slide 28), we get the MMSE estimator:

E(θ | x) = µθ + (C−1θ + HTC−1

w H)−1HTC−1w (x − Hµθ)

Page 32: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

or, in our case,3

E(θ | x) =

(1σ2θ

I + HT1σ2

wIH)−1

HT1σ2

wIx

=

(1σ2θ

I +1σ2

wHTH

)−1

HT1σ2

wx

3Note the correspondence with Ridge regression. It holds that Ridge regression is equivalent to the Bayesianestimator with Gaussian prior for the coefficients. It also holds that the LASSO is equivalent to the Bayesianestimator with Laplacian prior.

Page 33: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• In earlier examples we have seen that the columns of H arenearly orthogonal (exactly orthogonal if f0 = k/N):

HTH ≈ N2

I

• Thus,

E(θ | x) ≈(

1σ2θ

I +N

2σ2w

I)−1

HT1σ2

wx

=

1σ2

w1σ2θ

+ N2σ2

w

HTx.

Page 34: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• In all, the MMSE estimates become

aMMSE =1

1 +2σ2/N

σ2θ

[2N

N−1∑n=0

x[n] cos 2πf0n

]

bMMSE =1

1 +2σ2/N

σ2θ

[2N

N−1∑n=0

x[n] sin 2πf0n

]

Page 35: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• For comparison, recall that the classical MVU estimator is

aMVU =2N

N−1∑n=0

x[n] cos 2πf0n

bMVU =2N

N−1∑n=0

x[n] sin 2πf0n

Page 36: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• The difference can be interpreted as a weighting betweenthe prior knowledge and the data.• If the prior knowledge is unreliable (σ2

θ large), then1

1+ 2σ2/Nσ2θ

≈ 1 and the two estimators are almost equal.

• If the data is unreliable (σ2 large), then the coefficient1

1+ 2σ2/Nσ2θ

is small, making the estimate close to the mean of

the prior PDF.

Page 37: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• An example run is illustrated below. In this case, N = 100,f0 = 15/N, and σ2

θ = 0.48566,σ2 = 4.1173. AltogetherM = 500 tests were performed.

• Since the prior PDF has a small variance, the estimatorgains a lot from using it. This is seen as a significantdifference between the MSE’s of the two estimators.

Page 38: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

−1 −0.5 0 0.5 10

10

20

30

40

50

60Classical estimator of a. MSE=0.072474

−1 −0.5 0 0.5 10

10

20

30

40

50

60Classical estimator of b. MSE=0.092735

−1 −0.5 0 0.5 10

10

20

30

40

50

60Bayesian estimator of a. MSE=0.061919

−1 −0.5 0 0.5 10

10

20

30

40

50

60Bayesian estimator of b. MSE=0.076355

Page 39: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• If the prior has a higher variance, the Bayesian approachdoes not perform that much better. In the pictures below,σ2θ = 2.1937, σ2 = 1.9078. The difference in performance is

negligible between the two approaches.

Page 40: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

−1 −0.5 0 0.5 10

10

20

30

40

50

60Classical estimator of a. MSE=0.040066

−1 −0.5 0 0.5 10

10

20

30

40

50

60Classical estimator of b. MSE=0.034727

−1 −0.5 0 0.5 10

10

20

30

40

50

60Bayesian estimator of a. MSE=0.03951

−1 −0.5 0 0.5 10

10

20

30

40

50

60Bayesian estimator of b. MSE=0.034477

Page 41: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MMSE Estimation of SinusoidalParameters

• The program code is available athttp://www.cs.tut.fi/courses/SGN-2606/BayesSinusoid.m

Page 42: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MAP Estimator

• Assume that

p(x[n] | θ) =

{θ exp(−θx[n]) if x[n] > 00, if x[n] < 0

with x[n] conditionally IID and the prior of θ:

p(θ) =

{λ exp(−λθ) if θ > 00 if θ < 0

• Now, θ is the unkown RV and λ is known.

Page 43: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MAP Estimator

• Then the MAP estimator is found by maximizing p(θ | x)or equivalently p(x | θ)p(θ).

• Because both PDF’s have an exponential form, it’s easier tomaximize the logarithm instead:

θ = arg maxθ

(lnp(x | θ) + lnp(θ)) .

Page 44: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MAP Estimator

• Now,

lnp(x | θ) + lnp(θ) = ln

[N−1∏n=0

θ exp(−θx[n])

]+ ln[λ exp(−λθ)]

= ln

[θN exp

(−θ

N−1∑n=0

x[n]

)]+ ln[λ exp(−λθ)]

= N ln θ−Nθx+ ln λ− λθ

• Differentiation produces

d

dθlnp(x | θ) + lnp(θ) =

N

θ−Nx− λ

Page 45: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: MAP Estimator

• Setting it equal to zero produces the MAP estimator:

θ =1

x+ λN

Page 46: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: Deconvolution

• Consider the situation where a signal s[n] passes through achannel with impulse response h[n] and is furthercorrupted by noise w[n]:

x[n] = h(n) ∗ s(n) +w[n]

=

K∑k=0

h[k]s[n− k] +w[n], n = 0, 1, . . . ,N− 1

Page 47: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: Deconvolution

• Since convolution commutes, we can write this as

x[n] =

ns−1∑k=0

h[n− k]s[k] +w[n]

• In matrix form this is expressed byx[0]x[1]

...x[N− 1]

=

h[0] 0 · · · 0h[1] h[0] · · · 0

......

. . ....

h[N− 1] h[N− 2] · · · h[N− ns]

s[0]s[1]

...s[ns − 1]

+

w[0]w[1]

...w[N− 1]

Page 48: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: Deconvolution

• Thus, we have again the linear model

x = Hs + w

where the unknown parameter θ is the original signal s.• The noise is assumed Gaussian: w[n] ∼ N(0,σ2).• A reasonable assumption for the signal is that s ∼ N(0, Cs)

with [Cs]ij = rss[i− j], where rss is the autocorrelationfunction of s.

• According to slide 28, the MMSE estimator is

E(s | x) = µs + CsHT (HCsHT + Cw)−1(x − Hµs)

= CsHT (HCsHT + σ2I)−1x

Page 49: Bayesian Estimation - TUThehu/SSP/lecture10.pdf · Bayesian Estimation Bayesian estimators di er from all classical estimators studied so far in that they consider the parameters

Example: Deconvolution

• In general, the form of the estimator varies a lot betweendifferent cases. However, as a special case:• When H = I, the channel is identity and only noise is

present. In this case

s = Cs(Cs + σ2I)−1x

This case is called the Wiener filter. For example, in a singledata point case,

s[0] =rss[0]

rss[0] + σ2 x[0]

Thus, the variance of the noise is used as a parametertelling the reliability of the data with respect to the prior.