Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ESE 524ESE 524Detection and Estimation Theoryy
Joseph A. O’SullivanJoseph A. O SullivanSamuel C. Sachs ProfessorElectronic Systems and Signals y g
Research LaboratoryElectrical and Systems Engineering
Washington UniversityWashington University211 Urbauer Hall
314-935-4173 (Lynda answers)
J. A. O'S. ESE 524, Lecture 17, 03/19/09 1
( y )[email protected]
Outline: Expectation-Maximization (EM) Al i h(EM) Algorithm Review of EM algorithm Review of EM algorithm Alternate Derivation Using the Convex
Decomposition Lemmaeco pos t o e a Example(s)
J. A. O'S. ESE 524, Lecture 17, 03/19/09 2
M i Lik lih d E ti tiMaximum Likelihood Estimation Maximum likelihood
estimation often involvesMaximum likelihood Estimation
estimation often involves maximizing a complicated nonlinear function
Use standard optimization
ˆ arg max ( | ) arg max ln ( | )
View as function of ˆ
ML p pθ θ
θ θ θ
θ
= =r r
Use standard optimization algorithms or the EM algorithm
EM algorithm is matched to bl h b
ˆ arg max ( ), ( ) ln ( | )ML l l pθ
θ θ θ θ= = r
problems that can be modeled with hidden data
Dempster, Laird, Rubin, J. Royal Statistical Society
Source ( | )p θs ( | )p r srsθ
Royal Statistical Society, 1977.
M. I. Miller and D. L. Snyder, Proc. IEEE, 1987.
J. A. O'S. ESE 524, Lecture 17, 03/19/09 3
Snyder, Proc. IEEE, 1987. DLS Tutorial
EM Algorithm Start with a reasonable guess
th i iti l ti t (0)
EM Algorithmˆas the initial estimate
Compute the expected value of the complete data l lik lih d f ti i ( )
(0)
( ) ( )
1. Initialization step: Select , let 0.2. E-step: Compute the expected value
ˆ ˆln ( | ) ,k k
k
Q E p
θ
θ θ θ θ
=
= s rloglikelihood function given the incomplete (or observed) data and the current estimate
( )( ) ( )
ln ( | ) ,
ˆln ( | ) ( | , )
3. M-step: Maximize this function
k
Q E p
p p d
θ θ θ θ
θ θ
=
s r
s s r s
estimate Maximize this function over
the parametersIt t
( 1)
pˆ arg maxk Q
θθ + = ( )( )ˆ
4. Iteration step: stop if converged; else
kθ θ
Iterate 1, go to step 2.k k= +
rsθ
J. A. O'S. ESE 524, Lecture 17, 03/19/09 4
Source ( | )p θs ( | )p r srsθ
Properties of the EM ( ) ( ) ( )( 1) ( )
( 1) ( ) ( 1) ( ) ( )
ˆ ˆln ( | ) ln ( | )ˆ ˆ ˆ ˆ ˆ
k k
k k k k k
p p
Q H C
θ θ
θ θ θ θ θ
+
+ +
−
= − + −
r r
Algorithm The EM algorithm
t i ll
( ) ( ) ( )( ) ( )( ) ( ) ( ) ( ) ( )
( 1) ( ) ( ) ( )
ˆ ˆ ˆ ˆ ˆ
ˆ ˆ ˆ ˆ
k k k k k
k k k k
Q H C
Q Q
θ θ θ θ θ
θ θ θ θ+
− + = − monotonically
increases the loglikelihood function
i i
( ) ( )( ) ( )
( ) ( )( ) ( ) ( 1) ( )ˆ ˆ ˆ ˆ
ˆ ˆ ˆ ˆ
k k k k
Q Q
H H
θ θ θ θ
θ θ θ θ+
+ −
at every iteration. Equality if and only if
Current estimate is a
( ) ( )( 1) ( ) ( ) ( )ˆ ˆ ˆ ˆ by M-step
ˆ
k k k kQ Q
H
θ θ θ θ+ ≥
( ) ( )( ) ( ) ( 1) ( )ˆ ˆ ˆk k k kHθ θ θ θ+−
maximum, and Posterior density
remains unchanged( ) ( )
( )( )
( )( 1)
ˆ| ,ˆ| , ln 0
ˆ| ,
kk
k
pp d
p
θθ
θ += ≥
s rs r s
s r
Applicable to MAP problems by modifying the M-step
( 1) ( )ˆ ˆln ( | ) ln ( | )k kp pθ θ+ ≥r r
[ ]Maximum A Posteriori (MAP) Estimationˆ arg max ln ( | ) ln ( )p pθ θ θ+r
J. A. O'S. ESE 524, Lecture 17, 03/19/09 5
y g p [ ]
( )( 1) ( )
arg max ln ( | ) ln ( )
ˆ ˆNew M-Step: arg max ln ( )
ML
k k
p p
Q p
θ
θ
θ θ θ
θ θ θ θ+
= +
= +
r
Alternate Derivation of EM Algorithm Via Convex OptimizationConvex Optimization
ML Estimation Problemˆ arg max ln ( | )ML p
θθ θ= r
Hidden variables model
( | ) ( | ) ( | )p p p d
θ
θ θ= r r s s s
{ }
ln ( | ) ln ( | ) ( | )
Define the set of probability density functions
p p p dθ θ =
r r s s s
{ }( )
( ) : ( ) 0, ( ) 1
ˆFix = , tk
s s s ds
θ θ
= Φ Φ ≥ Φ =P
hen ( )ˆ sΦ ( )
( )
( )ˆln ( | ) ( | ) min ( ) ln ˆ( | ) ( | )Double Maximization for ML Estimation
( | ) ( | )
kk
sp p d s dsp p
θθ
θ
Φ∈
Φ = − Φ r s s sr s sP
J. A. O'S. ESE 524, Lecture 17, 03/19/09 6
( | ) ( | )max ln ( | ) max max ( ) ln( )
p pp s dssθ θ
θθΦ∈
= ΦΦ
r s srP
EM Algorithm as an Alternating Maximization Algorithm EM AlgorithmAlgorithm
Alternately maximize over the
( )
( )
EM Algorithmˆ1. Select initial guess . Set 0.
ˆ2. E-step: Maximize over with fixed.
k
k
kθθ
=
Φ ∈Pmaximize over the posterior density and the parameter vectorB d
( )( )
( )
( )
ˆ( | ) ( | )ˆ ( ) ˆ( | ') ( ' | ) '
ˆ3 M t M i i ith ( ) fi d
kk
k
k
p pp p d
θθ
θ
Φ =
Φ
r s ssr s s s
Based on a variational representation of the loglikelihood f ti f th
( )
( 1
3. M-step. Maximize over with ( ) fixedˆ
k
k
θ
θ +
Φ s) ( ) ( )ˆˆarg max ( ) ln ( | ) arg max ( | )
4 Check for convergence; else 1 go to step 2
k kp d Q
k kθ θ
θ θ θ= Φ =
= +
s s s
function from the convex decomposition lemma
4. Check for convergence; else 1, go to step 2.Double Maximization for ML Estimation
( | ) ( | )max ln ( | ) max max ( ) ln( )
k k
p pp dθ θ
θθΦ
= +
= ΦΦ
r s sr s sP
We lift the original problem to a higher dimensional problem where the
( )
( ) : ( ) 0, ( )s s s
θ θ Φ∈ Φ
= Φ Φ ≥ Φ
sP
P { }1ds =
J. A. O'S. ESE 524, Lecture 17, 03/19/09 7
optimization is “easier.”
Convex Decomposition LemmaLemma: Suppose , 0, at least one 0. Theni i iq q q< ∞ ≥ >
ln ln , wheremin
i
ii i
pi i i
pq pq∈
= −
P
: 0, 1
i
i ii
p p p
= ≥ =
P
Proof:
( ) ln 1ii i
pL p pq
ν = − −
p
( ) ln 1 0
i ii
l
l l
qpL
p qν
∂ = + − =
∂p
* 1 ll l
ii
qp q eq
ν −= =
J. A. O'S. ESE 524, Lecture 17, 03/19/09 8
** ln ii
i i
ppq
= − ln ii
q
Example of Convex Decomposition L P i D M d lLemma: Poisson Data Model Change to a double maximization
( )( , ) ln ( , ) ( , )i j
y i j i j i jλ λ− =
/ 2 / 2 / 2 / 2
/ 2 / 2 / 2 / 2( , ) ln ( , ) ( , ) ( , ) ( , )
( , ) ( , )( | ) ( ) l ( ) (
K L K L
i j k K l L k K l Ly i j h k l c i k j l h k l c i k j l
h k l c i k j lk l i j i j h k l i k
=− =− =− =−
− − − − −
− −Φ
[ ])j l ( , ) ( , )max ( , | , ) ( , ) ln ( , ) (
( , | , )k l
jk l i j y i j h k l c i kk l i jΦ
= Φ − − Φ
[ ], )i j k l
j l−
J. A. O'S. ESE 524, Lecture 17, 03/19/09 9
EM Al rithEM Algorithm E-step is a weighted Poisson mean M-step sets next value to the mean
( ) ( ) ( )ln | | , ( , ) | , ln ( , ) ( , )m mE P E s i j c i j c i j = − s c y c y c( )( 1) ( )
( )
( , ) ( , ) | ,
( , ) ( , )( )
i j
m m
m
c k l E s k l
c k l y i jh k l
+
=
y c
( ),
, ', '
( , ) ( , )( , )( , ) ( ', ') ( ', ')m
i ji j k l
c k l y i jh i k j lh i k j l h i k j l c k l
= − −− − − −
This algorithm is widely used in astronomical imaging (calledThis algorithm is widely used in astronomical imaging (called the Lucy-Richardson algorithm; see also D. L. Snyder and T. Schulz), positron emission tomography (PET); many other situations (EMML—expectation maximization maximum lik lh d)
J. A. O'S. ESE 524, Lecture 17, 03/19/09 10
likelhood)
E pl P i Pl G iExample: Poisson Plus Gaussian Signal model is a Poisson random
variable plus additive (discrete time)y n w= +
variable plus additive (discrete-time) white Gaussian noise.
The mean of the Poisson represents the activity of interest
, 0!
k
n e kk
λλ − ≥
(
the activity of interest.
This is a model for the data available at the readout of many charge-coupled devices The charges are
)
( )k
w
p Y e λ
σλ∞
−=
2N(0,
coupled devices. The charges are shifted from one well to another, then read out serially at one location using an amplifier. The amplifier noise is
( )2
0
2
( )!
1
yk
Y k
p Y ek
σ
=
−−
×
2an amplifier. The amplifier noise is well modeled as white Gaussian. The charge is modeled as resulting from counting photons and is modeled as
2
2ˆ arg max ln ( )ML
e
p Y
σ
πσλ
×
=
2
J. A. O'S. ESE 524, Lecture 17, 03/19/09 27
g pPoisson.
arg max ln ( )ML yp Yλ
λ
E pl P i Pl G iExample: Poisson Plus Gaussian Hidden data: Poisson
random variable( )2
21( )Y kk
λλ −∞ −
2random variable Complete data loglikelihood
is PoissonE t d l f th
2
0
1( )! 2
ˆ arg max ln ( )
yk
ML
p Y e ek
p Y
λ σλπσ
λ
−
=
=
=
2
2
Expected value of the complete data loglikelihood given the measured data and the current estimate
arg max ln ( )
ˆ ˆ( | ) ln | ,
ML yp Y
Q E n Yλ
λ
λ λ λ λ λ = − and the current estimate depends only on the posterior mean of the data.
The next estimate of the
( 1) ( )
( 1) ( )
ˆ ˆarg max ( | )
ˆ ˆ|
k k
k k
Q
E Yλ
λ λ λ
λ λ
+
+
=
The next estimate of the parameter equals the posterior mean.
( 1) ( )| ,k kE n Yλ λ+ =
J. A. O'S. ESE 524, Lecture 17, 03/19/09 28
E pl P i Pl G iExample: Poisson Plus Gaussian Iterations involve a
nonlinear function in
( )2
21( )!
Y kk
yp Y e ek
λ σλ −∞ −−= 2
2nonlinear function in this case.
Numerical, analytical, or lookup 2
0
( 1) ( )
( )
! 2ˆ ˆ| ,
ˆ
yk
k k
mk
k
E n Y
πσλ λ
=
+ =
2
or lookup approximation may be required.
( )
( )
2
( )
2
( )ˆ 2
0( )
( )
ˆ 1! 2ˆ| ,
ˆ
k
mk Y m
mkmk Y m
m e em
E n Y
λ σλ
πσλλ
−∞ −−
=
−
=
2
2
( )( )
2
( )ˆ 2
0
1( ) 2
1! 2ˆ
kk Y m
m
mk m Ym
e em
λ σλ
πσ
λ
−∞ −−
=
−−
2
2
( )
( ) 22
1( ) ( ) 1 !ˆ ˆ| ,ˆ
m Ym
mk k
em
E n Y
σλ
λ λλ
∞ −
=
− =
2
2( ) 2mk m Ym−∞ −
J. A. O'S. ESE 524, Lecture 17, 03/19/09 29
λ2
0 !me
mσ
=
2
Estimate Variance of Gaussian in White G i N iGaussian Noise Suppose that N i.i.d. , 1, 2,...,m m mr s w m M= + =
measurements of zero mean Gaussian random variables snare made in additive white G i i f k
i.i.d. (0, ), i.i.d. (0, ),, ,
i i d (0 )
m m
m k
s P w Ns w m kr P N
∀+
N N
N
Gaussian noise wn of known variance.
Find the maximum likelihood ti t f th k
( ) ( )2
1
i.i.d. (0, )
1 1ln2 2
m
Mm
m
r P Nrl P P N
P N=
+
= − + −+r
N
estimate of the unknown variance.
Two approaches: ( )
21
2
First order necessary condition
1 1 02 2
Mmm
rl MP P N
=∂ = − + =∂
Analytical EM algorithm
( )2
2
1
2 2
1max ,0M
ML m
P P N P N
P r NM
∂ + +
= −
J. A. O'S. ESE 524, Lecture 17, 03/19/09 30
1mM =
Gaussian Variance i AWGN
, 1, 2,...,i.i.d. (0, ), i.i.d. (0, ),
m m m
m m
r s w m Ms P w N
= + =N Nin AWGN
Analytical: can find
( , ), ( , ),, ,
i.i.d. (0, )
m m
m k
m
s w m kr P N
∀+N
the solution directly EM algorithm
Complete data
( ) ( )2
1
1 1ln2 2
Complete data loglikelihood function
Mm
m
rl P P NP N=
= − + −+r
pcomprise the pairs of random variables (sn,wn)
( )
( )
2
1
Co p e e da a og e ood u c o
ln2 2
Mm
cdm
sMl P PP=
= − −
s
( ) ( )
( )
( ) ( )
( )
ˆ ˆ,
ˆ
k kcd
k
Q P P E l P P
Q P P
= s r
2 ( )1 ˆln ,M
km
M P E s P = − − r( )Q
( )1
( ) 2 ( )
1
,2 2
1ˆ ˆln ,2 2
mmM
k km m
m
PMQ P P P E s r P
P
=
=
= − −
J. A. O'S. ESE 524, Lecture 17, 03/19/09 31
1
( 1) 2 ( )
1
1ˆ ˆ,
mM
k km m
mP E s r P
M+
=
=
Estimate Variance of Gaussian in White G i N iGaussian Noise
, 1, 2,...,Complete data loglikelihood functionm m mr s w m M= + =
( )( ) 2 ( )
1
Complete data loglikelihood function1ˆ ˆln ,
2 2
Mk k
m mm
MQ P P P E s r PP=
= − −
( )
( 1) 2 ( )
1
222 ( ) ( ) ( ) ( )
1ˆ ˆ,
ˆ ˆ ˆ ˆ
Mk k
m mm
k k k k
P E s r PM
+
=
=
( )2 ( ) ( ) ( ) ( )ˆ ˆ ˆ ˆ, , , ,k k k km m m m m m m mE s r P E s r P E s E s r P r P
E s r
= + − ( )
( ) ( ) ( )ˆ 1ˆ ˆ ˆvar
kk k kPP r s r P P = = ,m mE s r
( ) ( )
2( )( 1) ( ) 2 ( )
, var ,ˆ ˆ1
ˆ 1ˆ ˆ ˆ
m m mk k
k Mk k k
P r s r P PP N P
N
PP P r P N+
+ +
= + J. A. O'S. ESE 524, Lecture 17, 03/19/09 32
( ) ( ) ( )( )
1ˆ mkm
P P r P NMP N =
= + − − +
Estimate Variance of Gaussian in White G i N iGaussian Noise Fixed point at maximum likelihood solution
R i f ? All iti t ti i t Region of convergence? All positive starting points. Rate of convergence? Linear. See below, where the
equality holds for positive estimate. If the estimate is 0, th th i blithen the convergence is sublinear
2( )( 1) ( ) 2 ( )
( )1
ˆ 1ˆ ˆ ˆˆ
k Mk k k
mkm
PP P r P NMP N
+
=
= + − − +
2( )* ( 1) * ( ) 2 * * ( )
( )1
ˆ 1ˆ ˆ ˆˆ
k Mk k k
mkm
PP P P P r P N P PMP N
+
=
− = − − − − + − +
( )2( )
* ( )( )
ˆˆ 1 ˆ
kk
k
PP PP N
= − − +
J. A. O'S. ESE 524, Lecture 17, 03/19/09 33( )
2** ( )
*ˆ 1k PP P
P N
≈ − − +
( )2
* ( )ˆ 11
k SNRP PSNR
= − − +
Estimation of a Source Distribution f S A Dfrom Sensor Array Data Suppose that an array of sensors is distributed over some area on a
two-dimensional plane. The location of each array element istwo dimensional plane. The location of each array element is denoted (xk,yk). A complex-valued signal is incident upon the array from angle (θ,φ) relative to the x-y axes (θ is pitch, φ is yaw). For radar, sonar, or radio communications, the complex values can represent the in phase and quadrature components of the signal. A narrowband approximation is usually made The signal is assumednarrowband approximation is usually made. The signal is assumed to come from the far field. The sensors sample the incoming waves at their respective locations. There is assumed to be additive white Gaussian noise at the sensors.
Far field assumption: Far field assumption: The curvature of the wavefront relative to the extent of the array is
negligible. Thus the phase shift relative to the center of the array at the center frequency is determined only by the direction cosines.
Narrowband assumption: Narrowband assumption: The bandwidth of the signal relative to the center frequency is small
(often taken at less than 10%). The sampling rate of the sensors satisfies the Nyquist criterion (at least
twice the bandwidth). Sampling is often done at an intermediate frequency or at baseband
J. A. O'S. ESE 524, Lecture 17, 03/19/09 34
frequency or at baseband. The change in the signal across the sensor array is negligible. That is,
relative to the maximum delay across the array, the signal does not change substantially.
Estimation of a Source Distribution f S A Dfrom Sensor Array Data The signal can come from a source or from reflections of a
t itt d F fl ti diff dtransmitted wave. For reflections, we assume diffuse and incoherent scatterers, so the samples of the reflectivity are independent and identically distributed random variables. If they come from a collection of smaller scatterers then athey come from a collection of smaller scatterers, then a complex Gaussian model is appropriate. For a source, we assume a distributed incoherent source, whose samples are well modeled by samples of a complex Gaussian distributionwell modeled by samples of a complex Gaussian distribution.
For complex Gaussian distributions, the real and imaginary parts are independent Gaussian random variables with zero mean and equal variance. This is also referred to asmean and equal variance. This is also referred to as “Goodman” class.
The AWGN is complex Gaussian, independent of the signal.
J. A. O'S. ESE 524, Lecture 17, 03/19/09 35
Estimation of a Source Distribution f S A Dfrom Sensor Array Data Under the assumptions,
a signal vector received
Direction vector is determined by the time delay( , ) cos cos , ( , ) sinx ya aθ φ θ φ θ φ φ= =a signal vector received
by the array equals a linear combination of direction vectors times
( , ) , ( , )
( , ) ( , ) ( , )
( , )( , )
x y
k k x k y
kk
d x a y ad
c
φ φ φ φθ φ θ φ θ φ
θ φτ θ φ
= +
=complex Gaussian random variables. The data vector that is available equals this
Phase shift is determined by the time delay and the center frequency
c
( )d θ φ available equals this signal vector plus a noise vector. If all sensors are identical, and the noise is
d d f
02 ( ,kj fe π τ θ− )
0
( , )exp 2
Signal equals a linear combination
kdjφ θ φπλ
= −
independent from sensor to sensor, the covariance matrix for the noise is a constant times an
1 0
( , )exp 2
Data equals signal plus noise
Mk m m
kn mnm
ds c j θ φπλ=
= −
J. A. O'S. ESE 524, Lecture 17, 03/19/09 36
constant times an identity matrix.
q g p, 1, 2,.., , 1, 2,...,kn kn knr s w k K n N= + = =
Estimation of a Source Distribution f S A Dfrom Sensor Array DataSignal equals a linear combination
( )M d θ φ
1 0
( , )exp 2
Data equals signal plus noise1 2 1 2
Mk m m
kn mnm
ds c j
k
θ φπλ=
= −
0
, 1, 2,.., , 1, 2,...,, (0, )
kn kn kn
n n n n
n n
r s w k K n NN
= + = == +=
r s w w Is Ac
CN
1 1 1
0
( , )exp 2 exp 2d dj jθ φπ πλ
− −
1 2 2 1
0 0
( , ) ( , )exp 2
( ) ( ) ( )
M Mdj
d d d
θ φ θ φπλ λ
θ φ θ φ θ φ
−
=A2 1 1 2 2 2 2
0 0 0
( , ) ( , ) ( , )exp 2 exp 2 exp 2 M Md d dj j jθ φ θ φ θ φπ π πλ λ λ
− − −
J. A. O'S. ESE 524, Lecture 17, 03/19/09 37
1 1 2 2
0 0 0
( , ) ( , ) ( , )exp 2 exp 2 exp 2K K K M Md d dj j jθ φ θ φ θ φπ π πλ λ λ
− − −
Estimation of a Source Distribution f S A Dfrom Sensor Array Data
0
Data equals signal plus noise , (0, ), i.i.d., 1,2,...,
(0 ) i i d 1 2n n n n N n N
n N= + == =
r s w w Is Ac c Σ
CNCN
†
0†
, (0, ), i.i.d., 1, 2,...,
(0, ), i.i.d., 1, 2,...,
is the complex conjugate transpose. More on
n n n
n
n NN n N
= =
+ =
s Ac c Σr AΣA IA
CN
CN
complex Gaussian:
( ) ( )2 2 2 22 2
2
1 12 2
Let x and y be independent Gaussian random variables with zero meanand variances equal to . Then the joint probability density function is
1 1 1x y x y
σ−− + − + ( )2 21 x y
N+
( )( ) ( )2 22 2
222 0
1 1 122
e e eN
σ σ
πσ ππσ= =
( )0 .
That is, the joint pdf is parameterized by the total variance.
N
J. A. O'S. ESE 524, Lecture 17, 03/19/09 38