View
261
Download
1
Category
Preview:
Citation preview
ESTIMATION THEORY
Outline
1. Random Variables
2. Introduction
3. Estimation techniques
4. Extensions to Complex Vector Parameters
5. Application to communication systems
[Kay’93] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory,
Prentice-Hall, New Jersey, 1993.
[Cover-Thomas’91] T. M. Cover and J. A. Thomas, Elements of Information Theory, Wi-
ley, New York, 1991.
1
Random Variables
Definitions
A random variable � is a function that assigns a number to every outcome of an experiment.
A random variable � is completely characterized by:
Its cumulative distribution function (cdf): � � ��� ��� � � � �
Its probability density function (pdf): � � �� ��
�� � � � ��� �
Properties
The probability that � lies between� and� � then is
�� � � � ���
� �� �
� � ��� � � �
The mean of � is given by
m � � E � � � � � � � � �� � � �The variance of � is given by
var � � E � � � � � m � � � �� �� � m � � � � � �� � � �2
Random Variables
Examples
Uniform random variable:
pdf: � � �� ��
��
�
�� � � � � �
� �� �
mean and variance: m � �� �
� var � � � � � � � �� �
Gaussian random variable:
pdf: � � �� ��
�� � �
exp
�� ��� � � � �
�� � �
mean and variance: m � � � var � � � �
3
Random Variables
Two random variables
For two random variables � and � , we can define� The joint cdf: � � ��� ��� � � � � � � � � � � �
� The joint pdf: � � ��� �� � � � �
�� � � �
� � ��� ��� � � �
The marginal pdfs � � ��� � and � � � � � can then be determined by
� � ��� ��� � � � � ��� � � � � � � � � � �� � � ��� �� � � � � �
The conditional pdfs � � �� �� � � � and � � � � � � �� � are given by
� � �� �� � � � � � � ��� ��� � � �
� � � � � � � � � � � �� ��� � � ��� �� � � �
� � ��� �
From this follows the popular Bayes’ rule
� � �� �� � � � �� � � � � � �� � � � �� �
� � � � �
� � � � � � � �� � � � ��� �
� � � � � � �� � � � ��� � � �
For independent random variables � and � we have
� � � � ��� � � ��� � � ��� � � � � � �
4
Random Variables
Function of random variables
Suppose � is a function of the random variables � and � , e.g., � � � � � � � �
Corresponding increments in the cdf of � and the joint cdf of � and � are the same� � � ��� �� � � � ��� ��� � � � � � � ��� � � � � � � ��� �� � � � � � � �
Hence, the expectation over � equals the joint expectation over � and �
The mean of � is given by
m � � E � ��� � � �� � � � ��� �� � � � � � � �
The variance of � is given by
var � � E � ��� � � � � m � � � �� � � � m � � � � � ��� �� � � � � � � �5
Random Variables
Vector random variables
A vector random variable � is a vector of random variables � � :
� � � � � � � ��� � � � � � ���
Its cdf/pdf is the joint cdf/pdf of all these random variables.
The mean of � is given by
� m � � �� E � � � � � �
m � � E � � � �The covariance matrix of is given by
� cov � � � � � E � � � � � � � � � � m � � � � � � m � � � �cov � � E � � � � � m � � � � � m � � � �
6
Introduction
Problem Statement
Suppose we have an unknown scalar parameter � that we want to estimate from an observed
vector , which is related to � through the following relationship
� � � � ��� �
where� is a random noise vector with probability density function (pdf) � �� � .
The estimator is of the form�
� � � � �Note that
�� itself is a random variable.
Hence, the performance of the estimator
�� should be described statistically.
7
Introduction
Special Models
To solve any estimation problem, we need a model. Here, we will look deeper into two
specific models:
The linear model: The relationship between and � is then given by
� � � �
where � is the model vector and � is the noise vector, which is assumed to have
mean � , m� � � , and covariance matrix � , cov� � � .
The linear Gaussian model: This model is a special case of the linear model, where the
noise vector� is assumed to be Gaussian (or normal) distributed:
� � � � � � � � � �� �� ��
�
� � � � � � det � � � � �exp
�� �
� �� ��� �
�
8
Estimation Techniques
We can view the unknown parameter � as a deterministic variable
� Minimum Variance Unbiased (MVU) Estimator
� Best Linear Unbiased Estimator (BLUE)
� Maximum Likelihood Estimator (MLE)
� Least Squares Estimator (LSE)
The Bayesian philosophy: � is viewed as a random variable
� Minimum Mean Square Error (MMSE) Estimator
� Linear Minimum Mean Square Error (LMMSE) Estimator
9
Minimum Variance Unbiased Estimation
A natural criterion that comes to mind is the Mean Square Error (MSE):
mse ��� E �
� � ��
� � � � � �� E �� � � �
�� � m �
� � � m ��
� � � � � �
� E �� � �
�� � m �
� � � � � m ��
� � � � � var ��
� m ��
� � � �
The MSE depends does not only depend on the variance but also on the bias.
This means that an estimator that tries to minimize the MSE will often depend on the
parameter � , and is therefore unrealizable.
Solution: constrain the bias to zero and minimize the variance, which leads to the so-called
Minimum Variance Unbiased (MVU) estimator:
unbiased: m ��� � for all �
minimum variance: var ��
is minimal for all �Remark: The MVU does not always exist and is generally difficult to find.
10
Minimum Variance Unbiased Estimation (Linear Gaussian Model)
For the linear Gaussian model the MVU exists and its solution can be found by means of
the Cramer-Rao lower bound (see notes, [Kay’93], [Cover-Thomas’91]):
�� � � � � ��� � �� � � ���
Properties:
m ��� �
var ��� � � � �� � ��
�� is Gaussian distributed, i.e.,
�� � � � � � � � � �� � �� �
11
Best Linear Unbiased Estimation
In this case we constrain the estimator to have the form
�� � ��
Unbiased:m �
�� E �
� ��
� �� �� E � � �� �� m � � � for all �
Minimum variance:
var ��� E �
� � ��
� � m �� � � �� E � � � �� � � m � � � � �
� �� E � � � � m � � � � m � � � � � � �� cov � � is minimal for all �
The first condition can only be satisfied if we assume a linear model for m � :
m � � � �
Hence, we have to solve
min
�
� �� cov � � � subject to �� � � �
12
Best Linear Unbiased Estimation
Problem: min
�
� �� cov � � � subject to �� � � �
Solution: � � ��� � cov� � � � � � cov� � � �� � �
� � � � � cov� � � �� � � cov�
�
Proof:
Using the method of the Lagrange multipliers, we obtain
� � �� cov � � � � �� � � � �
Setting the gradient with respect to � to zero we get
� �� �
� � cov � � � � � � � � � � cov
� � �� � �
The Lagrange multiplier� is obtained by the contraint
�� � � � � � cov
� � �� � � � � � � � � � � � � cov
� � � ��
Properties: m ��� � var �
�� � � � cov�
� � ��
13
Best Linear Unbiased Estimation (Linear Model)
For the linear model the BLUE is given by
�� � � � � ��� � �� � � ���
Remark: For the linear model the BLUE equals the MVU only when the noise is Gaussian.
14
Maximum Likelihood Estimation
Since the pdf of depends on � , we often write it as a function that is parametrized on � :
� � � � � � � � �
� � . This function can also be interpreted as the likelihood function, since it
tells us how likely it is to observe a certain . The Maximum Likelihood Estimator (MLE)
finds the � that maximizes � � � �
� � for a certain .
The MLE is generally easy to derive.
Asymptotically, the MLE has the same mean and variance as the MVU (but not asymp-
totically equivalent to the MVU).
15
Maximum Likelihood Estimation (Linear Gaussian Model)
For the linear Gaussian model, the likelihood function is given by
� � � �
� � � �� � � � � ���
�
� � � � � � det � � � � �
exp
�� �
� � � � � � � ��� � � � � � �
It is clear that this function is maximized by solving
min
�
� � � � � � � ��� � � � � � �
16
Maximum Likelihood Estimation (Linear Gaussian Model)
Problem: min
�
� � � � � � � � � � � � � � �
Solution:
�� � � � � � � � �� � � � �
Proof:
Rewriting the cost function that we have to minimize, we get
� � � � � � � � ��� � � � � � � � ��� � � � � ��� � � � ��� � � �
Setting the gradient with respect to � to zero we get
� �� �
� � � � � �� � � � �� � � � �
Remark: For the linear Gaussian model, the MLE is equivalent to the MVU estimator.
17
Least Squares Estimation
The Least Squares Estimator (LSE) finds the � for which
�
� � � � � � � �
� is minimal
Properties:
No probabilistic assumptions required
The performance highly depends on the noise
18
Least Squares Estimation (Linear Model)
For the linear model, the LSE solves
min
�
�
� � ��
�
Problem: min
�
�
� � ��
�
Solution:�
� � � � � � �� � �
Proof: As before
Remark: For the linear model the LSE corresponds to the BLUE when the noise is white,
and to the MVU when the noise is Gaussian and white.
19
Least Squares Estimation (Linear Model)
Orthogonality Condition
Let us compute � � � �� � � � :
� � � �� � � � � � � �
� � � � � � � � �� � � � �
� � �
For the linear model the LSE leads to the following orthogonality condition:� � � �� � � � � � � � � � �� � � �
�
����
�
20
The Bayesian Philosophy
� is viewed as a random variable and we must estimate its particular realization
This allows us to use prior knowledge about � , i.e., its prior pdf � � � � � .
Again, we would like to minimize the MSE
Bmse ��� E �
� � ��
� � � � � �
but this time both� and � are random, hence the notation Bmse for Bayesian MSE.
Note the difference between these two MSEs:
mse ��� E �
� � ��
� � � � � �� E� � ��
� � � � � �
� ��
� � � � � �� �� � � � � ��
� � � � � � � � � �
Bmse ��� E �
� � ��
� � � � � �� E� � � � ��
� � � � � �
� ��
� � � � � �� � � �� � � � � � � � � ��
� � � � � � � � � � � � � � � �
Whereas the first MSE depends on � , the second MSE does not depend on � .
21
Minimum Mean Square Error Estimator
We know that � � � � � � � � � � � � � � � � � � � � � , so that
Bmse ���
� ��
� � � � � � � � � � � � � � ��
� � � � �
Since � � � �� � for all , we have to minimize the inner integral for each .
Problem: min
�
��
� � � � � � � � � � � � � � �
Solution: mean of posterior pdf of � :
�� � � � � � � � � � � � �
Proof: Setting the derivative with respect to
�� to zero we obtain:
�� ��
��
� � � � � � � � � � � � � � � � � ��
� � � � � � � � � � � � � � � ��
� � � � � � � � � � � � � � � �
Remarks:
In contrast to the MVU estimator the MMSE estimator always exists.
The MMSE has a smaller average MSE (Bayesian MSE) than the MVU, but the MMSE
estimator is biased whereas the MVU estimator is unbiased.
22
Minimum Mean Square Error Estimator (Linear Gaussian Model)
For the linear Gaussian model where � is assumed to be Gaussian with mean 0 and variance
� ��
: � � � � � �� �� � , the MMSE estimator can be found by means of the conditional pdf of a
Gaussian vector random variable [Kay’93]:
�� � � �
�
� � �� ��
� � � � �� � � � � �� � � � �
�
� �� � � ��
where the last equality is due to the matrix inversion lemma (see notes):
��� � �� �� � � � � � � � � ��� � � � � �� � � �
Remark: Compare this with the MVU for the linear Gaussian model.
23
Linear Minimum Mean Square Error Estimator
As for the BLUE, we now constrain the estimator to have the form
�� � �� .
The Bayesian MSE can then be written as
Bmse ��� E � � � � � �� � � � � �� �� E � � � � � � � �� E � � � � � � E � � � � �
�� �� � � � � � � ��
� � � � �
Setting the derivative with respect to � to zero, we obtain� � ��� � � � � � � � �
The LMMSE estimator is therefore given by
�� � �� � �
��� � �
24
Linear Minimum Mean Square Error Estimator
Orthogonality Condition
Let us compute E � � � � ��
� � � � � :
E � � � � ��
� � � � �� E � � � � � ��� � � � � � � � E � � � � � ��� �
The LMMSE leads to the following orthogonality condition:
E � � � � ��
� � � � �� � � ��
� � � � � �� ��
� �
� �
�
��
25
Linear Minimum Mean Square Error Estimator (Linear Model)
For the linear model where � is assumed to have mean 0 and variance� ��
, the LMMSE
estimator is given by
�� � � �
�
� � �� ��
� � � � �� � � � � ��� � � � �
�
� �� � � ���
where the last equality is again due to the matrix inversion lemma.
Remark: The LMMSE estimator is equivalent to the MMSE estimator when the noise and
the unknown parameter are Gaussian.
26
Summary
linear model linear Gaussian model
� deterministic � deterministic
MVU ?
�� � ��� � �� � �� � � �� �
BLUE
�� � �� � � � � �� � � � � � same as linear model
MLE ?
�� � ��� � � � � �� � � � � �
LSE
�� � ��� � � �� � � � same as linear model
� stochastic with mean � and var. ��� � � Gaussian with mean � and var. ��� �
MMSE ?�
� � �� � �� � � �� �� � �� � � �� �
LMMSE
�� � ��� � �� � � �� �� � �� � � �� � same as linear model
27
Extensions to Complex Vector Parameters
linear model linear Gaussian model
� deterministic � deterministic
MVU ?
�� � ��� � �� � �� � � �� �
BLUE
�� � �� � � � � �� � � � � � same as linear model
MLE ?
�� � ��� � � � � �� � � � � �
LSE
�� � �� � � �� � � � same as linear model
� stochastic with mean 0 and cov. � � � Gaussian with mean � and cov. � �
MMSE ?�
� � �� � �� � � �� � �� � � �� �
LMMSE
�� � ��� � �� � � ��
� �� � � �� � same as linear model
28
Application to Communications
� ��� � � ��� �
� ��� �
� ��� �
channel
� ��� � � ��� �
� �� �� � �� �� �� � � �� �� �
�� �� �� � �� � � � � �� � � �� � has length L
� �� �� �� �� � �� � � �� ����
� � � �� � � �� � � � � �� � �� � has length K
29
Application to Communications
Defining � �� � � � ��� � � � � ��� � � � � � � and� � �� � � � �� � � �� ��� � � � � � � , we obtain
Symbol estimation model: � � � �
� �
� �������
� � � �.... . .
� �� � � � � � � �. . ....
� �� � � ��
� � � � � � ��� � � � ��� � � � ���
Channel estimation model: � �� �
� �� ������
� � � �...
. . .
��� � � � � � �. . ....
��� � � ��
� � � � � � � ��� � � � � �� � � � ���
30
Application to Communications
Most communications systems (GSM, UMTS, WLAN, ...) consist of two periods:
Training period: During this period we try to estimate the channel by transmitting some
known symbols, also known as training symbols or pilots.
Data period: During this period we use the estimated channel to recover the unknown
data symbols that convey useful information.
What kind of processing do we use in each of these periods?
During the training period we use one of the previously developed estimation techniques
on the channel estimation model, � �� � , assuming that � is known.
During the data period we use one of the previously developed estimation techniques
on the symbol estimation model, � � � � , assuming that � is known.
31
Application to Communications
Channel estimation
Let us assume that cov� � � � � � �
BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):
�� � � ��� � �� ���
LMMSE (or when the noise and channel are Gaussian also the MMSE):
�� � � � � � � � �� � �� ���
Remark: Note that the LMMSE estimator requires the knowledge of � � � E � �� � �
�
which is generally not available.
32
Application to Communications
Symbol estimation
Let us assume that cov� � � � � � �
BLUE, LSE (or when the noise is Gaussian also the MVU and MLE):
�� � � � � � �� � �
LMMSE (or when the noise and symbols are Gaussian also the MMSE):
�� � � � � � � � ��� � �� � �
Remark: Note that the LMMSE estimator requires the knowledge of � �� E
� �
� ���
which
can be set to� ��� if the data symbols have energy� �
�
and are uncorrelated.
33
Recommended