Elements of Estimation Theoryradio.feld.cvut.cz › ... › 2_ElemEstTh_handout.pdf · Elements of Estimation Theory Jan Sykora Czech Technical University in Prague Czech Republic

Elements of Estimation Theory

Jan Sykora

Czech Technical University in Prague

Czech Republic

Synchronization and Equalization in Digital CommunicationscourseA-SEK

Jan Sykora (CTU in Prague) ElemEstTh(EET) A-SEK, A3.0.0 (10.10.2010) 1 / 49

c© Jan Sykora, 2010

E-mail: [email protected], http://radio.feld.cvut.cz/~sykora/This text can be exclusively used by students of Synchronization and Equalization in Digital Communications course at CzechTechnical University in Prague, faculty of electrical engineering as a support material for their preparation to this course. Thesestudents are allowed to make an electronic or printed copy solely for their personal use. Redistribution of the document in anyelectronic or printed form is prohibited. The only distribution source is the one maintained by the author. Any other utilizationof the document requires permission from the author.


Outline

1 Parameter model

2 Estimator criterionGoal & metricML (Maximum Likelihood)Bayesian estimatorsLS (Least Squares)Method of Moments

3 Performance parameters of the estimator

4 Performance limits

5 Sufficient statistics

6 Estimator equation and solver


Parameter model

Vector parameter representation

All parameters can be represented as vector parameters — θ

constant parameter (scalar, vector)

continuous time-variant parameters

signal expansion (Signal Space, Sampling)

discrete time parameters


Parameter model

Parameter value domain modelA priori known properties

Parameter value domain model

stochastic parameter

Random Parameter with Known PDFRandom Parameter with Unknown PDF

parameterized PDF from given classparameter range

deterministic parameter

parameter range


Parameter model

Dynamic model of the parameterA priori known properties

Dynamic model

Linear models

Auto-Regressive Moving Average (ARMA(p, q))

p∑

i=0

aiθ[k − i]

︸︷︷︸

AR

=

q∑

i=0

biu[k − i]

︸︷︷︸

MA

excitation u[k] zero mean uncorrelated sequence, usually a0 = 1State Space Description (excitation u)

continuous time ∂θ∂t

= X(θ(t), u(t), t)discrete time θ[k+ 1] = X(θ[k], u[k], k)

Power Spectrum density (PSD)Correlation propertiesParameter subspace specification

Nonlinear dynamic models


Estimator criterion Goal & metric

Estimator criterion

Estimator criterion

associates estimator with given performance goal/metric

e.g. Maximum A posteriori Probability

each performance goal requires specific parameter model

e.g. criteria for random/deterministic parameters

Estimator equation

typically maximizes (minimizes) given performance goal/metric

θ = argmaxθρ(θ) or θ=argmin

θρ(θ)

Estimator solver

solver finds the solution of the criterion

typically as a solution of µ(θ) = 0

solver’s own approximations and imperfections

create additional “layer” of performance degradation


Estimator criterion Goal & metric

Criterion goal/metric domains

Criterion goal/metric domains

stochastic domain

goal metric = mean value of some performance metric

achieves the goal when repeated many timese.g. Mean Square Error (Bayesian estimator class)

goal metric = other stochastic based value

e.g. Likelihood (conditional probabilistic outcome)

time (series) domain

goal metric = parametric approximation of the functionachieves the goal on single realization

e.g. Least Squares

ad-hoc

e.g. Method of Moments


Estimator criterion ML (Maximum Likelihood)

ML (Maximum Likelihood)

ML goal metric

conditional probabilistic outcome

ML estimator

θ = argmaxθ

p(x|θ)

it can be always constructed

usually good performance

asymptotically unbiased and efficient (attains CRLB)

θasympt∼ N (θJ−1(θ))

very popular choice


Estimator criterion Bayesian estimators

Bayesian estimators

Bayesian goal metric

Goal metric

mean value of some performance metric (Loss function)

Loss functionL(θ, θ)

performance metric

Bayesian riskR(θ) = Ex,θ

[L(θ, θ)

]

mean value of the loss functionmean over all random influences affecting observation

including parameter itself — requires knowledge of PDF



MAP (Maximum A posteriori Probability)Derivation

Loss function

L(θ, θ) =

{0, ‖θ − θ‖ < ∆θ

1, ‖θ − θ‖ ≥ ∆θ

, ∆θ → 0+

Bayes risk

R(θ) = E[L(θ, θ)

]

=

∫

{x}

∫

{θ}

L(θ, θ)p(θ|x)p(x) dθ dx

=

∫

{x}

(

1−

∫

{∆θ(θ)}

p(θ|x) dθ

)

p(x) dx

lim∆θ→0+

∫

{∆θ(θ)}

p(θ|x) dθ = lim∆θ→0+

∫

{∆θ(θ)}

dθ

︸︷︷︸

a1>0

p(θ|x) = a1p(θ|x)



MAP (Maximum A posteriori Probability)

MAP estimator

p(x) ≥ 0

θ = argminθ

R(θ) = argmaxθ

p(θ|x)

Special case of uniform pθ(θ) PDF

MAP≡ML (constrained to the permissible range of parameter)

θ = argmaxθ

p(θ|x)

= argmaxθ

p(x|θ)pθ(θ)

px(x)

= argmaxθ

p(x|θ)



MSE (Mean Square Error)Derivation

Loss function

L(θ, θ) = ‖θ − θ‖2

Bayes risk (special case: real scalar parameter)

R(θ) = E[L(θ, θ)

]=

∫

{x}

∫

{θ}

(θ − θ)2p(θ|x) d θ p(x) dx

argminθ

R(θ) = argminθ

∫

{θ}

(θ − θ)2p(θ|x) d θ

︸︷︷︸

R(θ|x)

∂

∂θR(θ|x) =

∂

∂θ

∫

{θ}

(θ − θ)2p(θ|x) d θ = −

∫

{θ}

2(θ − θ)p(θ|x) d θ

∂

∂θR(θ|x)

∣∣θ=θ

= 0

θ =

∫

{θ}

θ p(θ|x) d θ



MSE (Mean Square Error)

MSE estimator

general case: complex vector parameter

θ = E [θ|x]

Note

do not confuse MSE estimator with MSE as performance metric for arbitraryother estimator


Estimator criterion LS (Least Squares)

LS (Least Squares)

LS goal metric

minimizes difference between measurement and assumed signal model

no statistical description of measurement needed

observation with symmetric fluctuations E[x|θ] = s(θ)

ModelSignal model s = s(θ)Observation x = x(s(θ))

Least Squares estimator

θ = argminθ

‖x− s(θ)‖2

Special case — Linear LS

Linear signal model s = Hθ , H observation matrix

Solutionθ = (HH

H)−1H

Hx


Estimator criterion Method of Moments

Method of Moments (MM)

MM goal metric

None (ad-hoc)

No general optimality

Simple to derive and to implement

Assume that some moment of the measurement depends on parameter in theknown way

µ = h(θ)

θ = h−1(µ)

Estimator is obtained by replacement of the moment by its estimate µ → µ

MM estimator

θ = h−1(µ)


Performance parameters of the estimator

Deterministic Parameter — Bias

Bias

b = E[θ − θ] = E[θ]− θ

θ is unbiased if b = 0



Deterministic Parameter — Estimator variance

Parameter vector θ = [. . . , θi, . . .]T , b = [. . . , bi, . . .]

T

Theorem (MSE as a performance criterion is related to variance)

If DP is unbiased then

MSE = E[|θi − θi|2]

︸︷︷︸

MSE

= var[θi]

Proof

E[|θi − θi|2] = E[|θi − E[θi] + E[θi]− θi|

2]

= E[|θi − E[θi]|2] + E[|E[θi]− θi|

2]

+2ℜ[

E[(θi − E[θi])(E[θi]− θi)∗]]

= var[θi] + |bi|2 + 2ℜ

[

|E[θi]|2 − E[θi]θ

∗i − |E[θi]|

2 + E[θi]θ∗i

]

= var[θi] + |bi|2



Minimum Variance Unbiased Estimator (MVU)

Minimum Variance Unbiased Estimator (MVU)

Variance as the estimator design criterion

MSE describes quality of estimator

b depends on θ

if b 6= 0 then estimator θ based on MSE depends on θ and is generallyunrealizable

useful only for b = 0 then MSE=variance



Random Parameter — Mean Square Error

MSE

Estimator design and performance metric criterion

E[|θi − θi|2]

Estimate is close to real value in average for any parameter realization.

Well defined interpretation as a quality of performanceNo problems with evaluation—known a priori PDFNo simple relation between MSE and var[θi] for RP



Random Parameter—Mean Estimator Bias andVariance

Mean Estimator Bias and Variance for RP

E[θ − θ], var[θ]

mean estimator bias and variance are senseless as the performance criterionfor RP

Example

Let x[k] = a+ w[k] is the measurement where w is WGN and a ∈ (−1, 1) isuniformly distributed RP

Let the estimator be a = 0

Then E[a] = E[a] = 0 and the variance var[a] = 0 which should indicate“perfect” estimator but in fact it is useless.



Conditional MSE, Bias, Variance for RP

Conditional characteristics

properties for any particular parameter realizationbetter performance criterion for RP

Conditional Biasb(θ) = E[θ − θ|θ] = E[θ|θ]− θ

Conditional MSEMSE(θi) = E[|θi − θi|

2|θi]

Similarly as for DP it can be shown that

MSE(θi) = E[|θi − θi|2|θi] = var[θi|θi] + |bi(θi)|

2

If the estimator is conditionally unbiased b(θ) = 0 then

MSE(θi) = var[θi|θi]



Acquisition time

Acquisition time

time in continuous domain or number of iterations necessary to reach the“locked” state of synchronizer/equalizer with given probability

Ta(Pa) : Pr{(θ(t)− θ(t)) ∈ A, ∀t > Ta} = Pa

Used for iterative (feed-back, sequential) estimators

Locked state region A defined in various ways

it should suitably reflect “close vicinity” of estimate to the real valueusually such that, when reached, the synchronizer can switch from acquisitionmode to tracking mode

Initial condition at t = 0 (k = 0) are random



Synchronization failure rate

Synchronization failure rate

Mean time between synchronizer locked state exits

Tracking mode usually exited — “locked” state lost

New acquisition necessary

Detector operation interrupted



PDF of estimation error

PDF of estimation error

p(θǫ), where θǫ = θ(x) − θ

Influence on BER

Let the conditional probability of data detection error (message, symbol, bit)be

Pe(θǫ) = Pr{

d(x) 6= d|θǫ

}

Average probability of data detection error

Pe =

∫

{θǫ}

Pe(θǫ) p(θǫ) dθǫ

Pe does not have to depend only on θǫ

often depends also on θ . . . average Pe will be parameterized



Other performance criteria

Amount and form of a priori known information needed in the signal

Preamble/Training sequence length

Robustness

Level of performance decay when channel model and other assumptionsdiverts from the assumed one

Range of nuisance parameter values

Range of nuisance parameter values the synchronizer can cope with


Performance limits

Cramer-Rao Lower Bound (CRLB)

Applicable to DP estimation, θ ∈ R

Applicable to any unbiased estimator

Theorem (Cramer-Rao Lower Bound (CRLB))

If the regularity condition holds

E

[∂ ln p(x|θ)

∂θ

]

= 0

then the variance of any unbiased estimator is

var[θi] ≥[J−1(θ)

]

i,i

where J is the Fisher information matrix

Jk,i(θ) = −E

[∂2 ln p(x|θ)

∂θk∂θi

]


Performance limits

Conditional Cramer-Rao Lower Bound

Applicable to RP estimation, θ ∈ R

Applicable to any conditionally unbiased estimatorAll expectations replaced by conditional expectations

performance bound for any particular parameter realization

Theorem (Conditional Cramer-Rao Lower Bound (Cond-CRLB))

If the regularity condition holds

E

[∂ ln p(x|θ)

∂θ

∣∣∣∣θ

]

= 0

then the variance of any conditionally unbiased estimator is

var[θi|θi] ≥[J−1(θ)

]

i,i

where J is the conditional Fisher information matrix

Jk,i(θ) = −E

[∂2 ln p(x|θ)

∂θk∂θi

∣∣∣∣θ

]


Performance limits

Phase estimation in discrete AWGN channelExample

System modelx[k] = ejϕ + w[k], k = 0, . . . (N − 1)

where w[k] RPKP—discrete CWGN with variance σ2w

Likelihood function

RPKP w eliminated

p(x|ϕ) =1

πNσ2Nw

exp

{

−1

σ2w

‖x − iejϕ‖2

}

=1

πNσ2Nw

exp

{

−1

σ2w

(

‖x‖2 + ‖i‖2 − 2ℜ[

(x · i)e−jϕ])}

i = [1, 1, . . . 1], ‖i‖2 = N

Λ(d, ϕ) = a1p(x|d, ϕ) = exp

{2

σ2w

ℜ[

(x · i)e−jϕ]}


Performance limits


Fisher Information matrix

dimension 1× 1, w eliminated

∂ ln p(x|ϕ)

∂ϕ=

∂

∂ϕ

1

σ2w

(

(x · i)e−jϕ + (x · i)∗ejϕ)

=1

σ2w

(

−j(x · i)e−jϕ + j(x · i)∗ejϕ)

∂2 ln p(x|ϕ)

∂ϕ2=

∂

∂ϕ

1

σ2w

(

−j(x · i)e−jϕ + j(x · i)∗ejϕ)

=1

σ2w

(

−(x · i)e−jϕ − (x · i)∗ejϕ)


Performance limits


Expectation

x = ejϕi+w

∂2 ln p(x|ϕ)

∂ϕ2

∣∣x=ejϕ i+w =

1

σ2w

(

−((ejϕi+w) · i)e−jϕ − ((ejϕi+w) · i)∗ejϕ)

−Ew

[∂2 ln p(x|ϕ)

∂ϕ2

∣∣x=ejϕi+w

]

=2

σ2w

‖i‖2

Jϕ,ϕ(ϕ) = −E

[∂2 ln p(x|ϕ)

∂ϕ2

]

=2N

σ2w

[J−1(ϕ)]ϕ,ϕ =σ2w

2N

var[ϕ] ≥σ2w

2N

�


Sufficient statistics



function T (x) of the measurement that contains all available informationcontained in the original measurement x about θ necessary for parameterestimator θ i.e.

θ(x) = θ (T (x))

p(x|T (x), θ) must not depend on θ

p(x|T (x), θ) = p(x|T (x))

Theorem (Neyman-Fisher Factorization theorem)

T (x) is the sufficient statistics ⇔ ∃g, h : p(x|θ) = g(T (x), θ)h(x)


Estimator equation and solver

Estimator equation

Criterion has usually form

θ = argminθ

ρ(θ)

or

θ = argmaxθ

ρ(θ)

Estimator equation

Objective function extreme search task (scalar case)

µ(θ) = ρ(θ) =∂ρ(θ)

∂θ

∣∣∣∣θ=θ

= 0

and (min, max)∂2ρ(θ)

∂θ2

∣∣∣∣θ=θ

> 0 (or < 0)



Direct solution (Feed-Forward (FF), single-shot)

Direct solver

Closed-form expressionθ = θ(x)

theoretically straightforwardit can be found only in a special casesusually complicated expression



Iterative solution (Feed-Back (FB), recursive)

Iterative solver

First derivative ρ(θ′)

used as an error (update, correction) signal for feed-back solver

“Hopefully” θ′asympt→ θ

towards higher values

loop error signal

objective function

ρ(θ′)

θ θ′

∇θ′ρ(θ′)

towards lower valuescorrect next iterationcorrect next iteration



Iterative solution (Feed-Back (FB), recursive)

Discrete time (G[.] is operator)

θ′[k + 1] = θ′[k] + G[

ρ(θ′[k])]

Continuous time (G[.] is operator)

∂θ′(t)

∂t= G

[

ρ(θ′(t))]

Properties

it does not have to convergelocal/global extremeinitial guess problemit can be always constructedusually relatively simple implementationperformance—Theory of Feed-Back systemsperformance is additionally influenced by convergence/tracking properties ofthe sequential solver



Expectation-Maximization ML iterative solver

Application of the EM algorithm to the ML channel parameter estimation

unavailable observation . . . data d

estimator based on complete observation x,d

data aided (DA) estimator

EM iterative ML solver

Approximation (replacement) marginalizing the unavailable observation d

ln p(x,d|θ) ≈ Ed|x,θk [ln p(x,d|θ)]

Expectation + Maximization step

EM iterator (arbitrary encoding stage)

θk+1 = argmax

θ

∑

q:d7→q

ln p(x|q, θ)p(q|x, θk)

averaging uses the a posteriori PDF



The End


Documents

Elements of Estimation Theoryradio.feld.cvut.cz › ... › 2_ElemEstTh_handout.pdf · Elements of Estimation Theory Jan Sykora Czech Technical University in Prague Czech Republic