ESE 524ESE 524 Detection and Estimation Theory · 2009-03-24 · ESE 524ESE 524 Detection and Estimation Theory Joseph A. OJoseph A. O Sullivan’Sullivan Samuel C. Sachs Professor

ESE 524ESE 524Detection and Estimation Theoryy

Joseph A. O’SullivanJoseph A. O SullivanSamuel C. Sachs ProfessorElectronic Systems and Signals y g

Research LaboratoryElectrical and Systems Engineering

Washington UniversityWashington University211 Urbauer Hall

314-935-4173 (Lynda answers)

J. A. O'S. ESE 524, Lecture 17, 03/19/09 1

( y )[email protected]

Outline: Expectation-Maximization (EM) Al i h(EM) Algorithm Review of EM algorithm Review of EM algorithm Alternate Derivation Using the Convex

Decomposition Lemmaeco pos t o e a Example(s)

J. A. O'S. ESE 524, Lecture 17, 03/19/09 2

M i Lik lih d E ti tiMaximum Likelihood Estimation Maximum likelihood

estimation often involvesMaximum likelihood Estimation

estimation often involves maximizing a complicated nonlinear function

Use standard optimization

ˆ arg max ( | ) arg max ln ( | )

View as function of ˆ

ML p pθ θ

θ θ θ

θ

= =r r

Use standard optimization algorithms or the EM algorithm

EM algorithm is matched to bl h b

ˆ arg max ( ), ( ) ln ( | )ML l l pθ

θ θ θ θ= = r

problems that can be modeled with hidden data

Dempster, Laird, Rubin, J. Royal Statistical Society

Source ( | )p θs ( | )p r srsθ

Royal Statistical Society, 1977.

M. I. Miller and D. L. Snyder, Proc. IEEE, 1987.

J. A. O'S. ESE 524, Lecture 17, 03/19/09 3

Snyder, Proc. IEEE, 1987. DLS Tutorial

EM Algorithm Start with a reasonable guess

th i iti l ti t (0)

EM Algorithmâs the initial estimate

Compute the expected value of the complete data l lik lih d f ti i ( )

(0)

( ) ( )

1. Initialization step: Select , let 0.2. E-step: Compute the expected value

ˆ ˆln ( | ) ,k k

k

Q E p

θ

θ θ θ θ

=

= s rloglikelihood function given the incomplete (or observed) data and the current estimate

( )( ) ( )

ln ( | ) ,

ˆln ( | ) ( | , )

3. M-step: Maximize this function

k

Q E p

p p d

θ θ θ θ

θ θ

=

s r

s s r s

estimate Maximize this function over

the parametersIt t

( 1)

pˆ arg maxk Q

θθ + = ( )( )ˆ

4. Iteration step: stop if converged; else

kθ θ

Iterate 1, go to step 2.k k= +

rsθ

J. A. O'S. ESE 524, Lecture 17, 03/19/09 4

Source ( | )p θs ( | )p r srsθ

Properties of the EM ( ) ( ) ( )( 1) ( )

( 1) ( ) ( 1) ( ) ( )

ˆ ˆln ( | ) ln ( | )ˆ ˆ ˆ ˆ ˆ

k k

k k k k k

p p

Q H C

θ θ

θ θ θ θ θ

+

+ +

−

= − + −

r r

Algorithm The EM algorithm

t i ll

( ) ( ) ( )( ) ( )( ) ( ) ( ) ( ) ( )

( 1) ( ) ( ) ( )

ˆ ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

k k k k k

k k k k

Q H C

Q Q

θ θ θ θ θ

θ θ θ θ+

− + = − monotonically

increases the loglikelihood function

i i

( ) ( )( ) ( )

( ) ( )( ) ( ) ( 1) ( )ˆ ˆ ˆ ˆ

ˆ ˆ ˆ ˆ

k k k k

Q Q

H H

θ θ θ θ

θ θ θ θ+

+ −

at every iteration. Equality if and only if

Current estimate is a

( ) ( )( 1) ( ) ( ) ( )ˆ ˆ ˆ ˆ by M-step

ˆ

k k k kQ Q

H

θ θ θ θ+ ≥

( ) ( )( ) ( ) ( 1) ( )ˆ ˆ ˆk k k kHθ θ θ θ+−

maximum, and Posterior density

remains unchanged( ) ( )

( )( )

( )( 1)

ˆ| ,ˆ| , ln 0

ˆ| ,

kk

k

pp d

p

θθ

θ += ≥

s rs r s

s r

Applicable to MAP problems by modifying the M-step

( 1) ( )ˆ ˆln ( | ) ln ( | )k kp pθ θ+ ≥r r

[ ]Maximum A Posteriori (MAP) Estimationˆ arg max ln ( | ) ln ( )p pθ θ θ+r

J. A. O'S. ESE 524, Lecture 17, 03/19/09 5

y g p [ ]

( )( 1) ( )

arg max ln ( | ) ln ( )

ˆ ˆNew M-Step: arg max ln ( )

ML

k k

p p

Q p

θ

θ

θ θ θ

θ θ θ θ+

= +

= +

r

Alternate Derivation of EM Algorithm Via Convex OptimizationConvex Optimization

ML Estimation Problemˆ arg max ln ( | )ML p

θθ θ= r

Hidden variables model

( | ) ( | ) ( | )p p p d

θ

θ θ= r r s s s

{ }

ln ( | ) ln ( | ) ( | )

Define the set of probability density functions

p p p dθ θ =

r r s s s

{ }( )

( ) : ( ) 0, ( ) 1

ˆFix = , tk

s s s ds

θ θ

= Φ Φ ≥ Φ =P

hen ( )ˆ sΦ ( )

( )

( )ˆln ( | ) ( | ) min ( ) ln ˆ( | ) ( | )Double Maximization for ML Estimation

( | ) ( | )

kk

sp p d s dsp p

θθ

θ

Φ∈

Φ = − Φ r s s sr s sP

J. A. O'S. ESE 524, Lecture 17, 03/19/09 6

( | ) ( | )max ln ( | ) max max ( ) ln( )

p pp s dssθ θ

θθΦ∈

= ΦΦ

r s srP

EM Algorithm as an Alternating Maximization Algorithm EM AlgorithmAlgorithm

Alternately maximize over the

( )

( )

EM Algorithmˆ1. Select initial guess . Set 0.

ˆ2. E-step: Maximize over with fixed.

k

k

kθθ

=

Φ ∈Pmaximize over the posterior density and the parameter vectorB d

( )( )

( )

( )

ˆ( | ) ( | )ˆ ( ) ˆ( | ') ( ' | ) '

ˆ3 M t M i i ith ( ) fi d

kk

k

k

p pp p d

θθ

θ

Φ =

Φ

r s ssr s s s

Based on a variational representation of the loglikelihood f ti f th

( )

( 1

3. M-step. Maximize over with ( ) fixedˆ

k

k

θ

θ +

Φ s) ( ) ( )ˆârg max ( ) ln ( | ) arg max ( | )

4 Check for convergence; else 1 go to step 2

k kp d Q

k kθ θ

θ θ θ= Φ =

= +

s s s

function from the convex decomposition lemma

4. Check for convergence; else 1, go to step 2.Double Maximization for ML Estimation

( | ) ( | )max ln ( | ) max max ( ) ln( )

k k

p pp dθ θ

θθΦ

= +

= ΦΦ

r s sr s sP

We lift the original problem to a higher dimensional problem where the

( )

( ) : ( ) 0, ( )s s s

θ θ Φ∈ Φ

= Φ Φ ≥ Φ

sP

P { }1ds =

J. A. O'S. ESE 524, Lecture 17, 03/19/09 7

optimization is “easier.”

Convex Decomposition LemmaLemma: Suppose , 0, at least one 0. Theni i iq q q< ∞ ≥ >

ln ln , wheremin

i

ii i

pi i i

pq pq∈

= −

P

: 0, 1

i

i ii

p p p

= ≥ =

P

Proof:

( ) ln 1ii i

pL p pq

ν = − −

p

( ) ln 1 0

i ii

l

l l

qpL

p qν

∂ = + − =

∂p

* 1 ll l

ii

qp q eq

ν −= =

J. A. O'S. ESE 524, Lecture 17, 03/19/09 8

** ln ii

i i

ppq

= − ln ii

q

Example of Convex Decomposition L P i D M d lLemma: Poisson Data Model Change to a double maximization

( )( , ) ln ( , ) ( , )i j

y i j i j i jλ λ− =

/ 2 / 2 / 2 / 2

/ 2 / 2 / 2 / 2( , ) ln ( , ) ( , ) ( , ) ( , )

( , ) ( , )( | ) ( ) l ( ) (

K L K L

i j k K l L k K l Ly i j h k l c i k j l h k l c i k j l

h k l c i k j lk l i j i j h k l i k

=− =− =− =−

− − − − −

− −Φ

[ ])j l ( , ) ( , )max ( , | , ) ( , ) ln ( , ) (

( , | , )k l

jk l i j y i j h k l c i kk l i jΦ

= Φ − − Φ

[ ], )i j k l

j l−

J. A. O'S. ESE 524, Lecture 17, 03/19/09 9

EM Al rithEM Algorithm E-step is a weighted Poisson mean M-step sets next value to the mean

( ) ( ) ( )ln | | , ( , ) | , ln ( , ) ( , )m mE P E s i j c i j c i j = − s c y c y c( )( 1) ( )

( )

( , ) ( , ) | ,

( , ) ( , )( )

i j

m m

m

c k l E s k l

c k l y i jh k l

+

=

y c

( ),

, ', '

( , ) ( , )( , )( , ) ( ', ') ( ', ')m

i ji j k l

c k l y i jh i k j lh i k j l h i k j l c k l

= − −− − − −

This algorithm is widely used in astronomical imaging (calledThis algorithm is widely used in astronomical imaging (called the Lucy-Richardson algorithm; see also D. L. Snyder and T. Schulz), positron emission tomography (PET); many other situations (EMML—expectation maximization maximum lik lh d)

J. A. O'S. ESE 524, Lecture 17, 03/19/09 10

likelhood)

E pl P i Pl G iExample: Poisson Plus Gaussian Signal model is a Poisson random

variable plus additive (discrete time)y n w= +

variable plus additive (discrete-time) white Gaussian noise.

The mean of the Poisson represents the activity of interest

, 0!

k

n e kk

λλ − ≥

(

the activity of interest.

This is a model for the data available at the readout of many charge-coupled devices The charges are

)

( )k

w

p Y e λ

σλ∞

−=

2N(0,

coupled devices. The charges are shifted from one well to another, then read out serially at one location using an amplifier. The amplifier noise is

( )2

0

2

( )!

1

yk

Y k

p Y ek

σ

=

−−

×

2an amplifier. The amplifier noise is well modeled as white Gaussian. The charge is modeled as resulting from counting photons and is modeled as

2

2ˆ arg max ln ( )ML

e

p Y

σ

πσλ

×

=

2

J. A. O'S. ESE 524, Lecture 17, 03/19/09 27

g pPoisson.

arg max ln ( )ML yp Yλ

λ

E pl P i Pl G iExample: Poisson Plus Gaussian Hidden data: Poisson

random variable( )2

21( )Y kk

λλ −∞ −

2random variable Complete data loglikelihood

is PoissonE t d l f th

2

0

1( )! 2

ˆ arg max ln ( )

yk

ML

p Y e ek

p Y

λ σλπσ

λ

−

=

=

=

2

2

Expected value of the complete data loglikelihood given the measured data and the current estimate

arg max ln ( )

ˆ ˆ( | ) ln | ,

ML yp Y

Q E n Yλ

λ

λ λ λ λ λ = − and the current estimate depends only on the posterior mean of the data.

The next estimate of the

( 1) ( )

( 1) ( )

ˆ ârg max ( | )

ˆ ˆ|

k k

k k

Q

E Yλ

λ λ λ

λ λ

+

+

=

The next estimate of the parameter equals the posterior mean.

( 1) ( )| ,k kE n Yλ λ+ =

J. A. O'S. ESE 524, Lecture 17, 03/19/09 28

E pl P i Pl G iExample: Poisson Plus Gaussian Iterations involve a

nonlinear function in

( )2

21( )!

Y kk

yp Y e ek

λ σλ −∞ −−= 2

2nonlinear function in this case.

Numerical, analytical, or lookup 2

0

( 1) ( )

( )

! 2ˆ ˆ| ,

ˆ

yk

k k

mk

k

E n Y

πσλ λ

=

+ =

2

or lookup approximation may be required.

( )

( )

2

( )

2

( )ˆ 2

0( )

( )

ˆ 1! 2ˆ| ,

ˆ

k

mk Y m

mkmk Y m

m e em

E n Y

λ σλ

πσλλ

−∞ −−

=

−

=

2

2

( )( )

2

( )ˆ 2

0

1( ) 2

1! 2ˆ

kk Y m

m

mk m Ym

e em

λ σλ

πσ

λ

−∞ −−

=

−−

2

2

( )

( ) 22

1( ) ( ) 1 !ˆ ˆ| ,ˆ

m Ym

mk k

em

E n Y

σλ

λ λλ

∞ −

=

− =

2

2( ) 2mk m Ym−∞ −

J. A. O'S. ESE 524, Lecture 17, 03/19/09 29

λ2

0 !me

mσ

=

2

Estimate Variance of Gaussian in White G i N iGaussian Noise Suppose that N i.i.d. , 1, 2,...,m m mr s w m M= + =

measurements of zero mean Gaussian random variables snare made in additive white G i i f k

i.i.d. (0, ), i.i.d. (0, ),, ,

i i d (0 )

m m

m k

s P w Ns w m kr P N

∀+

N N

N

Gaussian noise wn of known variance.

Find the maximum likelihood ti t f th k

( ) ( )2

1

i.i.d. (0, )

1 1ln2 2

m

Mm

m

r P Nrl P P N

P N=

+

= − + −+r

N

estimate of the unknown variance.

Two approaches: ( )

21

2

First order necessary condition

1 1 02 2

Mmm

rl MP P N

=∂ = − + =∂

Analytical EM algorithm

( )2

2

1

2 2

1max ,0M

ML m

P P N P N

P r NM

∂ + +

= −

J. A. O'S. ESE 524, Lecture 17, 03/19/09 30

1mM =

Gaussian Variance i AWGN

, 1, 2,...,i.i.d. (0, ), i.i.d. (0, ),

m m m

m m

r s w m Ms P w N

= + =N Nin AWGN

Analytical: can find

( , ), ( , ),, ,

i.i.d. (0, )

m m

m k

m

s w m kr P N

∀+N

the solution directly EM algorithm

Complete data

( ) ( )2

1

1 1ln2 2

Complete data loglikelihood function

Mm

m

rl P P NP N=

= − + −+r

pcomprise the pairs of random variables (sn,wn)

( )

( )

2

1

Co p e e da a og e ood u c o

ln2 2

Mm

cdm

sMl P PP=

= − −

s

( ) ( )

( )

( ) ( )

( )

ˆ ˆ,

ˆ

k kcd

k

Q P P E l P P

Q P P

= s r

2 ( )1 ˆln ,M

km

M P E s P = − − r( )Q

( )1

( ) 2 ( )

1

,2 2

1ˆ ˆln ,2 2

mmM

k km m

m

PMQ P P P E s r P

P

=

=

= − −

J. A. O'S. ESE 524, Lecture 17, 03/19/09 31

1

( 1) 2 ( )

1

1ˆ ˆ,

mM

k km m

mP E s r P

M+

=

=

Estimate Variance of Gaussian in White G i N iGaussian Noise

, 1, 2,...,Complete data loglikelihood functionm m mr s w m M= + =

( )( ) 2 ( )

1

Complete data loglikelihood function1ˆ ˆln ,

2 2

Mk k

m mm

MQ P P P E s r PP=

= − −

( )

( 1) 2 ( )

1

222 ( ) ( ) ( ) ( )

1ˆ ˆ,

ˆ ˆ ˆ ˆ

Mk k

m mm

k k k k

P E s r PM

+

=

=

( )2 ( ) ( ) ( ) ( )ˆ ˆ ˆ ˆ, , , ,k k k km m m m m m m mE s r P E s r P E s E s r P r P

E s r

= + − ( )

( ) ( ) ( )ˆ 1ˆ ˆ ˆvar

kk k kPP r s r P P = = ,m mE s r

( ) ( )

2( )( 1) ( ) 2 ( )

, var ,ˆ ˆ1

ˆ 1ˆ ˆ ˆ

m m mk k

k Mk k k

P r s r P PP N P

N

PP P r P N+

+ +

= + J. A. O'S. ESE 524, Lecture 17, 03/19/09 32

( ) ( ) ( )( )

1ˆ mkm

P P r P NMP N =

= + − − +

Estimate Variance of Gaussian in White G i N iGaussian Noise Fixed point at maximum likelihood solution

R i f ? All iti t ti i t Region of convergence? All positive starting points. Rate of convergence? Linear. See below, where the

equality holds for positive estimate. If the estimate is 0, th th i blithen the convergence is sublinear

2( )( 1) ( ) 2 ( )

( )1

ˆ 1ˆ ˆ ˆˆ

k Mk k k

mkm

PP P r P NMP N

+

=

= + − − +

2( )* ( 1) * ( ) 2 * * ( )

( )1

ˆ 1ˆ ˆ ˆˆ

k Mk k k

mkm

PP P P P r P N P PMP N

+

=

− = − − − − + − +

( )2( )

* ( )( )

ˆˆ 1 ˆ

kk

k

PP PP N

= − − +

J. A. O'S. ESE 524, Lecture 17, 03/19/09 33( )

2** ( )

*ˆ 1k PP P

P N

≈ − − +

( )2

* ( )ˆ 11

k SNRP PSNR

= − − +

Estimation of a Source Distribution f S A Dfrom Sensor Array Data Suppose that an array of sensors is distributed over some area on a

two-dimensional plane. The location of each array element istwo dimensional plane. The location of each array element is denoted (xk,yk). A complex-valued signal is incident upon the array from angle (θ,φ) relative to the x-y axes (θ is pitch, φ is yaw). For radar, sonar, or radio communications, the complex values can represent the in phase and quadrature components of the signal. A narrowband approximation is usually made The signal is assumednarrowband approximation is usually made. The signal is assumed to come from the far field. The sensors sample the incoming waves at their respective locations. There is assumed to be additive white Gaussian noise at the sensors.

Far field assumption: Far field assumption: The curvature of the wavefront relative to the extent of the array is

negligible. Thus the phase shift relative to the center of the array at the center frequency is determined only by the direction cosines.

Narrowband assumption: Narrowband assumption: The bandwidth of the signal relative to the center frequency is small

(often taken at less than 10%). The sampling rate of the sensors satisfies the Nyquist criterion (at least

twice the bandwidth). Sampling is often done at an intermediate frequency or at baseband

J. A. O'S. ESE 524, Lecture 17, 03/19/09 34

frequency or at baseband. The change in the signal across the sensor array is negligible. That is,

relative to the maximum delay across the array, the signal does not change substantially.

Estimation of a Source Distribution f S A Dfrom Sensor Array Data The signal can come from a source or from reflections of a

t itt d F fl ti diff dtransmitted wave. For reflections, we assume diffuse and incoherent scatterers, so the samples of the reflectivity are independent and identically distributed random variables. If they come from a collection of smaller scatterers then athey come from a collection of smaller scatterers, then a complex Gaussian model is appropriate. For a source, we assume a distributed incoherent source, whose samples are well modeled by samples of a complex Gaussian distributionwell modeled by samples of a complex Gaussian distribution.

For complex Gaussian distributions, the real and imaginary parts are independent Gaussian random variables with zero mean and equal variance. This is also referred to asmean and equal variance. This is also referred to as “Goodman” class.

The AWGN is complex Gaussian, independent of the signal.

J. A. O'S. ESE 524, Lecture 17, 03/19/09 35

Estimation of a Source Distribution f S A Dfrom Sensor Array Data Under the assumptions,

a signal vector received

Direction vector is determined by the time delay( , ) cos cos , ( , ) sinx ya aθ φ θ φ θ φ φ= =a signal vector received

by the array equals a linear combination of direction vectors times

( , ) , ( , )

( , ) ( , ) ( , )

( , )( , )

x y

k k x k y

kk

d x a y ad

c

φ φ φ φθ φ θ φ θ φ

θ φτ θ φ

= +

=complex Gaussian random variables. The data vector that is available equals this

Phase shift is determined by the time delay and the center frequency

c

( )d θ φ available equals this signal vector plus a noise vector. If all sensors are identical, and the noise is

d d f

02 ( ,kj fe π τ θ− )

0

( , )exp 2

Signal equals a linear combination

kdjφ θ φπλ

= −

independent from sensor to sensor, the covariance matrix for the noise is a constant times an

1 0

( , )exp 2

Data equals signal plus noise

Mk m m

kn mnm

ds c j θ φπλ=

= −

J. A. O'S. ESE 524, Lecture 17, 03/19/09 36

constant times an identity matrix.

q g p, 1, 2,.., , 1, 2,...,kn kn knr s w k K n N= + = =

Estimation of a Source Distribution f S A Dfrom Sensor Array DataSignal equals a linear combination

( )M d θ φ

1 0

( , )exp 2

Data equals signal plus noise1 2 1 2

Mk m m

kn mnm

ds c j

k

θ φπλ=

= −

0

, 1, 2,.., , 1, 2,...,, (0, )

kn kn kn

n n n n

n n

r s w k K n NN

= + = == +=

r s w w Is Ac

CN

1 1 1

0

( , )exp 2 exp 2d dj jθ φπ πλ

− −

1 2 2 1

0 0

( , ) ( , )exp 2

( ) ( ) ( )

M Mdj

d d d

θ φ θ φπλ λ

θ φ θ φ θ φ

−

=A2 1 1 2 2 2 2

0 0 0

( , ) ( , ) ( , )exp 2 exp 2 exp 2 M Md d dj j jθ φ θ φ θ φπ π πλ λ λ

− − −

J. A. O'S. ESE 524, Lecture 17, 03/19/09 37

1 1 2 2

0 0 0

( , ) ( , ) ( , )exp 2 exp 2 exp 2K K K M Md d dj j jθ φ θ φ θ φπ π πλ λ λ

− − −

Estimation of a Source Distribution f S A Dfrom Sensor Array Data

0

Data equals signal plus noise , (0, ), i.i.d., 1,2,...,

(0 ) i i d 1 2n n n n N n N

n N= + == =

r s w w Is Ac c Σ

CNCN

†

0†

, (0, ), i.i.d., 1, 2,...,

(0, ), i.i.d., 1, 2,...,

is the complex conjugate transpose. More on

n n n

n

n NN n N

= =

+ =

s Ac c Σr AΣA IA

CN

CN

complex Gaussian:

( ) ( )2 2 2 22 2

2

1 12 2

Let x and y be independent Gaussian random variables with zero meanand variances equal to . Then the joint probability density function is

1 1 1x y x y

σ−− + − + ( )2 21 x y

N+

( )( ) ( )2 22 2

222 0

1 1 122

e e eN

σ σ

πσ ππσ= =

( )0 .

That is, the joint pdf is parameterized by the total variance.

N

J. A. O'S. ESE 524, Lecture 17, 03/19/09 38

Documents

ESE 524ESE 524 Detection and Estimation Theory · 2009-03-24 · ESE 524ESE 524 Detection and Estimation Theory Joseph A. OJoseph A. O Sullivan’Sullivan Samuel C. Sachs Professor