Citation preview
Yali Wang
August 2011
1
Resume
Stochastic filtering refers to an issue for estimating the latent
variables from the ob- servations using a stochastic model. Since
it was developed in early 1940s, filtering has been one of the most
practical tools in the science and engineering field. One of the
best known filtering algorithms is a class of Gaussian Filters.
However, its gaussian assumption for posterior distribution
severely affects their estimation performance. A class of Particle
Filter is an alternative way to approximate arbitrary posterior
distribution using Sequen- tial Monte Carlo Sampling framework.
Current studies have shown that the Particle Filter scheme suffers
weight degeneracy problem since sequential importance sampling.
Hence I propose new methods based on nonparametric technique to
overcome the drawbacks in the existing filters. More specifically,
I will combine different sampling methods with adaptive mechanism
to improve the accuracy. Then I will design a better nonparametric
proposal distribution for solving the underlying weight degeneracy
problem. I will also consider baye- sian nonparametric filtering
and the flexibility that it offers. In this context, I will design
a general filtering and smoothing framework for bayesian
nonparametric models. Finally, due to the fact that all the
techniques for state estimation are the foundation of nonlinear
control problems, I will incorporate the nonparametric filtering
methods into nonlinear controller design and apply it in the real
robot system. Keywords : State estimation, gaussian filter,
Particle Filter, nonparametric method, nonlinear control.
Table des matieres
1 Introduction 4
2 Probabilistic Inference in Temporal Dynamic System 6 2.1 Modeling
for Temporal Dynamic System . . . . . . . . . . . . . . . . . . . .
. . 6 2.2 Inference in Temporal Dynamic System . . . . . . . . . .
. . . . . . . . . . . . 7 2.3 Recursive Bayesian Filtering . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Background and State of the Art 9 3.1 Gaussian Filter . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 10 3.1.2 Extended Kalman Filter . . . . . . . . .
. . . . . . . . . . . . . . . . . . 11 3.1.3 Gaussian Sum Filter .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.4
Unscented Kalman Filter . . . . . . . . . . . . . . . . . . . . . .
. . . . 17 3.1.5 Summary . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 20
3.2 Monte Carlo Sampling Methods . . . . . . . . . . . . . . . . .
. . . . . . . . . . 21 3.2.1 Monte Carlo Principle . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 21 3.2.2 Important Sampling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.3
Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . .
. . . . 23
3.3 Sequential Monte Carlo Estimation : Particle Filter . . . . . .
. . . . . . . . . . 29 3.3.1 Sequential Importance Sampling (SIS) .
. . . . . . . . . . . . . . . . . . 30 3.3.2 Resampling Step . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3.3
Sampling Importance Resampling (SIR) Particle Filter . . . . . . .
. . . 33 3.3.4 Discussion on SIR Particle Filters . . . . . . . . .
. . . . . . . . . . . . 34
3.4 Particle Markov Chain Monte Carlo . . . . . . . . . . . . . . .
. . . . . . . . . 35 3.4.1 Particle Metropolis-Hastings Sampler . .
. . . . . . . . . . . . . . . . . 35 3.4.2 Discussion on Particle
MCMC . . . . . . . . . . . . . . . . . . . . . . . 37
4 Research Objectives and Approach 39 4.1 Research Objectives . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2
Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 40
4.2.1 Methodologies for objective 1 . . . . . . . . . . . . . . . .
. . . . . . . . 40 4.2.1.1 Methodologies for subobjective 1a) . . .
. . . . . . . . . . . . . 40 4.2.1.2 Methodologies for subobjective
1b) . . . . . . . . . . . . . . . 40
4.2.2 Methodologies for objective 2 . . . . . . . . . . . . . . . .
. . . . . . . . 40
2
3
4.2.3 Methodologies for objective 3 . . . . . . . . . . . . . . . .
. . . . . . . . 42 4.2.3.1 Methodologies for subobjective 3a) . . .
. . . . . . . . . . . . . 42 4.2.3.2 Methodologies for subobjective
3b) . . . . . . . . . . . . . . . 43
4.2.4 Methodologies for objective 4 . . . . . . . . . . . . . . . .
. . . . . . . . 43 4.2.5 Methodologies for objective 5 . . . . . .
. . . . . . . . . . . . . . . . . . 43 4.2.6 Methodologies for
objective 6 . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.7 Work Package . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 44
5 Current Work and Preliminary Results 46 5.1 Adaptive Particle
Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . .
46
5.1.1 KLD-Sampling Particle filter . . . . . . . . . . . . . . . .
. . . . . . . . 47 5.1.2 KLD-Sampling Particle Markov Chain Monte
Carlo . . . . . . . . . . . 48 5.1.3 Numerical Illustration . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Gaussian Process Based Particle Filter . . . . . . . . . . . .
. . . . . . . . . . . 50 5.2.1 Gaussian Process Regression . . . .
. . . . . . . . . . . . . . . . . . . . 52 5.2.2 Particle Filter
Based on Gaussian Process . . . . . . . . . . . . . . . . . 52
5.2.3 Numerical Illustration . . . . . . . . . . . . . . . . . . .
. . . . . . . . . 53
6 Work Plan and Implementation 55 6.1 Work Plan . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2
Potential Obstacles . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 55
7 Conclusion 57
1
Introduction
The foundation of understanding the dynamic phenomenon is to
estimate its true state. State estimation problem is actually a
filtering problem which uses the history of observation sequence to
estimate the underlying hidden state sequence. It could be widely
used for a large number of real applications in science and
engineering, such as economics, biostatistics, stochastic signal
processing, robot localization and navigation, etc. Hence, accurate
and efficient state estimation is very crucial to analyze the
properties in the complicated dynamic systems.
During the past decades, there have been a great number of
filtering algorithms for state estimation problem. The most classic
algorithm is a class of gaussian filters which estimate the
posterior distribution by a gaussian distribution. Kalman Filter
[Kalman, 1960] is the best- known optimal filter for linear and
gaussian systems. But Kalman Filter is relatively theoretical since
most real systems are nonlinear and/or nongaussian. Hence, to
efficiently solve state es- timation problems in practice, Extended
Kalman Filter [Jazwinski, 1970] was developed by linearizing the
nonlinear system to satisfy the Kalman Filter Framework. But due to
the fact that Extended Kalman Filter approximates the posterior
distribution as a gaussian distribu- tion, it works terribly when
the true posterior distribution is highly nongaussian. Gaussian Sum
Filter[Daniel et al., 1972] applies a mixture of gaussian
distributions to approximate the posterior distribution, which
works like a number of Extended Kalman Filters in parallel. Ho-
wever, both Extended Kalman Filter and Gaussian Sum Filter have to
calculate the Jacobian matrices with high computational cost.
Therefore, the derivative-free Unscented Kalman Filter [Simon et
al., 1995; Eric et al., 2000] was designed using deterministic
sigma points directly to approximate the posterior density in
Kalman Filter framework. Even though there has been a lot of
application using this gaussian filter class, the gaussian
assumption will deteriorate the performance when the true
distribution is totally nongaussian.
Monte Carlo sampling methods are the alternative algorithms to
approximate the nongaus- sian posterior distribution by stochastic
sampling. Sequential Monte Carlo Sampling method [Jun et al., 1998;
Arnaud et al., 2000; Olivier et al., 2007] is the one of the most
efficient al- gorithm for filtering problem, and the resulting
filter is a class of Particle Filter. Sequential Importance
Sampling Particle filter [Zhe, 2003] is a popular framework to
estimate the impor- tant region of posterior distribution
recursively. However, weight degeneracy problem in this
4
5
algorithm would heavily affect the approximation performance since
the variance of the par- ticle weights will be very large after a
few iterations [Zhe, 2003; Caglar et al., 2011; Fred et al., 2011;
Samarjit, 2010]. Sampling Importance Resampling Particle Filter
[Arnaud et al., 2000; Deok-Jin, 2005] alleviates weight degeneracy
problem with a Resampling step after Importance Sampling at each
time step. But this Particle Filter just reduces the degeneracy
problem ra- ther than solving it due to the fact that Resampling
step actually introduces high correlation between particles after a
few iterations through replicating the particles with high
important weight, which is the root why this Particle Filter
framework works poorly in high dimensional systems [Fred et al.,
2011].
As mentioned before, there still exists serval difficulties in
filtering problem, especially when the dynamic system is
high-dimensional and nongaussian. Bayesian Nonparametric Technique
is a promising strategy that could tackle the estimation problem
for more general dynamic sys- tems. Therefore, the aims of this
proposal is to apply Nonparametric Technique to estimation problem
with high accuracy as well as acceptable computational cost.
Firstly, we attempt to take advantage of sequential Monte Carlo and
MCMC to improve the estimation accuracy by combining them together
with adaptive scheme. Secondly, to overcome the underlying weight
degeneracy problem in the Particle Filter, we design a good
proposal distribution using nonpa- rametric methods. Thirdly,
because dynamic state space model is unknown in the real world, we
try to develop a filtering framework for nonparametric dynamic
system so that the proposed algorithms could be applied for a more
general state estimation problem. Then, we propose to estimate the
states of nonparametric dynamic model with smoothing rather
filtering since the performance of smoothing is better [Stuart et
al., 2010; Carlos et al., 2010]. Finally, control problem for
nonlinear systems is actually the final goal [Nadine et al., 2007].
In this context, we plan to incorporate the nonparametric particle
filter into a nonlinear controller in order to increase the control
robustness [Villiers et al., 2011; Stahl et al., 2011].
Here is the outline of this research proposal. In Chapter 2,
probabilistic inference tasks in temporal dynamic system is briefly
introduced with a general state space model, then we deviate how to
infer the target density using recursive bayesian estimation for
filtering. Chapter 3 is the foundation of Bayesian Filtering theory
starting from Kalman Filter to Particle Filter. After we illustrate
the shortcomings of the existing filters, Chapter 4 will be used to
propose my objectives and goals. Then the current research works
following research goals are shown in Chapter 5. In Chapter 6, I
show my work plan and time schedule. Finally, there is a conclusion
for this proposal in Chapter 7.
2
Probabilistic Inference in Temporal Dynamic System
Probabilistic inference involves such a class of problems :
according to the history of obser- vable information, we attempt to
find an efficient method to estimate the present, predict the
future and assess the past. There are a great number of its
applications which are widely used in industry, transportation,
biomedicine, communication and so on. For example, the task is to
evaluate the blood sugar level for a patient. By using the past
blood sugar information, the nurse try to interpret the present
level and predict its change in order to set the future medical
plan. However, the blood sugar changes rapidly over time. Hence,
before estimating the current and future values, we have to model
the dynamic phenomena first.
2.1 Modeling for Temporal Dynamic System
Without loss of generality, the following dynamic state space model
of a nonlinear system is considered for inference problem :
xt = a(xt−1,wt−1) (2.1)
yt = b(xt,vt) (2.2)
where at time t, xt is the hidden system state vector, wt is the
system noise vector, yt is the observation vector, vt is the
observation noise vector.
In the state equation (2.1), current state vector xt is generated
by a nonlinear function a of previous state xt−1 and system noise
vector wt−1. In the observation equation (2.2), current observation
vector yt is obtained by a nonlinear function b of current state xt
and observation noise vector vt. Hence, by assuming the probability
distribution of system noise wt and observation noise vt, the state
and observation equation actually represent transition density
p(xt|xt−1) and likelihood density p(yt|xt) respectively.
6
7
Additionally, we assume that state transition and observation
process satisfy first-order Markovian, i.e., the current state
vector xt only depends on the last state vector xt−1, which means
that the conditional probability of current state xt given all the
past states x0:t−1 = {x1,x2, · · · ,xt−1} is equal to the
transition density p(xt|xt−1) ; the current observation vector yt
is only determined by the current state vector xt, which
illustrates that the conditional probability of current observation
yt given all the states x0:t and all the past observations y0:t−1
is equal to the likelihood density p(yt|xt).
According to the Markovian assumption, the state space model is
actually the functional form of the hidden markov model, which
consists of initial density p(x0), transition density p(xt|xt−1),
and likelihood density p(yt|xt).
· · · xt−1 p(xt|xt−1) xt p(xt+1|xt) xt+1 · · ·
p(yt−1|xt−1) p(yt|xt) p(yt+1|xt+1)
yt−1 yt yt+1
Figure 2.1: Hidden Markov Model.
2.2 Inference in Temporal Dynamic System
After setting up the general temporal model, we could classify the
inference task into four parts [Stuart et al., 2010; Heijden et
al., 2004] :
Filtering the task is to evaluate the posterior distribution
p(xt|y0:t). Filtering is also called state estimation.
Prediction the task is to evaluate the posterior distribution
p(xt+k|y0:t) for k > 0 over the future state.
Smoothing the task is to evaluate the posterior distribution
p(xk|y0:t) for 0 ≤ k < t over the past state. Smoothing offers a
better estimation of the state than filtering since it incorporates
more observation information, but it’s an off-line method.
Most likely explanation the task is to find the most likely state
sequence given the observa- tion to date. The method to tackle this
problem is the so called Viterbi algorithm which is also an
off-line technique.
Additionally, there is a modeling task - learning. The goal is to
learn the state space model or hidden markov model using the
observation sequence.
Here we mainly focus on the filtering task. There are at least two
main reasons. Firstly, the states themselves in the real world are
very important for evaluating the dynamic systems.
8
For example, a mobile robot infers where it is by using filtering
techniques to make the further path plan. Secondly, the states are
estimated online and treated as a feedback to the nonlinear
controller in order to improve the transitional and steady
performance of close loop control system. For example, the position
and orientation states in a mobile robot are estimated to control
the transitional and rational velocity so that the tracking error
could be reduced. Therefore, it is of importance to explore the
different kinds of filtering algorithms for the real engineering
applications.
2.3 Recursive Bayesian Filtering
To be able to efficiently calculate the posterior distribution of
the hidden state sequence, the first intuition is to use Bayes Rule
and Markov assumption to transform p(xt|y0:t) into the following
form that could be computed recursively.
p(xt|y0:t) = p(y0:t|xt)p(xt)
p(yt|y0:t−1)p(y0:t−1)
= p(yt|xt)p(xt|y0:t−1)
p(yt|y0:t−1) (2.3)
p(yt|y0:t−1) =
∫ p(yt|xt)p(xt|y0:t−1)dxt (2.5)
the equation (2.4) represents a one-step prediction of the next
state and the equation (2.5) updates the prediction with new
observation.
Generally, the calculation of the integrations in the equation
(2.4) and equation (2.5) deter- mine whether the filtering problem
could be solved or not, and the integrations largely depend on the
characteristics of dynamic systems which classify the existing
filtering methods. For the linear and gaussian systems, all the
probability distributions mentioned are multi-variable gauss
distributions obtained by linear transmission in the state-space
model. Hence, there exists an integrable optimal form of
p(xt|y0:t), which is the well-known Kalman Filter. However, for
most applications in the real world, the dynamic systems are
nonlinear and non-gaussian, which leads to the fact that the
computation of equation (2.4) and equation (2.5) are not feasible.
Hence, the numerical approximation methods have been widely
developed for the past decades. The following chapter will
introduce serval typical filtering algorithms for state estimation
problem in the bayesian framework.
3
3.1 Gaussian Filter
Stochastic filtering refers to an issue for estimating the latent
variables from the observable phenomena using a stochastic model.
Since it was developed in early 1940s, filtering has been one of
the most practical tools in the science and engineering
field.
One of the best-known filters is Kalman Filter [Kalman, 1960],
which is an optimal minimum mean squared error estimator for linear
and gaussian systems. However, due to the fact that most of the
dynamic systems in the real world are nonlinear and nongaussian,
estimation analysis by Kalman Filter is intractable. In order to
solve estimation problems in practice, from mid 1960s, a great
number of extensions have been investigated to approximately
estimate the hidden states by modifying the Kalman Filter
framework. The classic method is Extended Kalman Filter [Jazwinski,
1970], which linearizes the nonlinear system using Taylor expansion
to satisfy the Kalman Filter Framework. Even though it has been
successfully used in many nonlinear problems, Extended Kalman
Filter is limited by assuming that the true posterior density is a
gaussian distribution. Especially when the true posterior
distribution is totally different from a gaussian distribution,
filtering performance by Extended Kalman Filter is heavily
distorted. Gaussian sum filter[Daniel et al., 1972] attempts to use
a gaussian mixture instead of one gaussian distribution to
approximate the arbitrary posterior distribution, which is actually
designed as a number of Extended Kalman Filters in parallel.
However, due to the fact that it applies EKF framework, the
Jacobian matrices have to be calculated at each iteration which
greatly increases the computational cost. Therefore, a lot of
attention has been paid recently to the derivative-free sampling
methods, which draws samples from the target distribution to avoid
Jacobian matrices computation. One of the representative
deterministic approximation algorithms is Unscented Kalman Filter
[Simon et al., 1995; Eric et al., 2000]. Applying so-called
Unscented Transformation, Unscented Kalman Filter directly uses
sigma points to estimate the latent states in general Kalman Filter
framework.
No matter which methods mentioned before, they all estimate the
posterior distribution as a gaussian distribution or mixture. Hence
they are a class of so called gaussian filters. In this section, we
introduce these gaussian filters starting from the basic Kalman
Filter.
9
10
3.1.1 Kalman Filter
Being considered as the simplified version of the general nonlinear
system, the linear and gaussian system is represented as follows
:
xt = Fxt−1 + wt−1 (3.1)
yt = Hxt + vt (3.2)
where the nonlinear function a(xt−1,wt−1) and b(xt,vt) in the
general model in the equation (2.1) and (2.2) are transformed into
Fxt−1 + wt−1 and Hxt + vt respectively. F is the constant linear
transition matrix and H is the constant observation matrix. Both
system noise wt and observation noise vt are white noise processes
with known covariance, i.e., wt ∼ N(0,Q), vt ∼ N(0,R), where Q = E
[wtw
T t ] and R = E [vtv
T t ].
Due to the fact that gaussian distribution is closed under the
linear transformation, the tran- sition probability is p(xt|xt−1) ∼
N(Fxt−1,Q), the likelihood density is p(yt|xt) ∼ N(Hxt,R). It
sequentially ensures that the integration in the equation (2.4) and
equation (2.5) will produce gaussian distributions. Therefore, the
posterior density will be gaussian according to the equa- tion
(2.3) in the recursive bayesian filtering. This closed-form filter
is the well known Kalman Filter.
Kalman Filter is composed of an iterative predict-update process
[Greg et al., 2001; Thacker et al., 2006; Tristan, 2010; Nathan,
2003]. Here is the general framework :
In the predict step, the estimated state from the previous time is
projected to produce a current state estimation, which is also
called the a priori estimation due to the fact that current
observation is not considered.
– Priori state estimation x′t = Fxt−1 (3.3)
– Priori covariance estimation P′t = FPt−1F
T + Q (3.4)
According to the estimated state xt−1, Kalman Filter makes one step
prediction of the current state and propagates the estimated
covariance Pt−1 using transition probability distri- bution
p(xt|xt−1) ∼ N(Fxt−1,Q) in the state equation.
In the update step, observation information is integrated with the
prediction to correct the priori estimation to obtain a posteriori
estimation.
– Kalman Gain Kt = P′tH
T (HP′tH T + R)−1 (3.5)
– Posteriori state estimation xt = x′t + Kt(yt −Hx′t) (3.6)
– Posteriori covariance estimation
Pt = (I−KtH)P′t (3.7)
The posterior estimated state xt is expressed as prior estimated
state x′t plus the correction item based on the difference between
predict observation Hx′t and actual observation yt. The posteriori
covariance estimation could be represented as prior estimated
covariance P′t minus
11
the covariance correction KtHP′t when the current observation is
available. Additionally, the Kalman Gain is obtained using the mean
square error measure in order to minimize the cova- riance Pt. The
whole update step illustrates that the estimated state would be
more accurate with smaller covariance after incorporating the
observation.
Even though Kalman Filter is an optimal algorithm with an elegant
recursive form for linear and gaussian systems, it is still limited
since most of the real systems are nonlinear and/or nongaussian. In
order to apply the Kalman Filter framework in practice, a so called
Extended Kalman filter is designed by relaxing the linear
assumption in Kalman Filter.
3.1.2 Extended Kalman Filter
From the perspective of bayesian filtering, integration of equation
(2.4) and equation (2.5) for real systems are impossible to be
calculated directly like Kalman Filter because the related
probability distributions of states and observations are not
standard guassian distributions. Hence, approximation methods are
the essence of filtering problem for a nonlinear and/or nongaussian
system. Extended Kalman filter is one of the well-known approximate
estimators.
Without loss of generality, we consider here a simple class of the
general nonlinear system in equation (2.1) and (2.2) with additive
gaussian noise :
xt = f(xt−1) + wt−1 (3.8)
yt = h(xt) + vt (3.9)
where the nonlinear function a(xt−1,wt−1) and b(xt,vt) in the
general model in the equation (2.1) and (2.2) are transformed into
f(xt−1) + wt−1 and h(xt) + vt respectively. f and g are nonlinear
differentiable function. Both system noise wt and observation noise
vt are white noise processes with known covariance, i.e., wt ∼
N(0,Q), vt ∼ N(0,R). Then the transition probability is p(xt|xt−1)
∼ N(f(xt−1),Q), the likelihood density is p(yt|xt) ∼
N(h(xt),R).
The general idea of Extended Kalman Filter is to linearize the
nonlinear system by using first-order taylor expansion to satisfy
the linear assumption of Kalman filter, then the state estimation
problem could be solved using Kalman Filter framework [Shoudong,
2010; Maria, 2004].
Here state equation is linearized at xt−1 ,
xt ≈ f(xt−1) +∇f |xt−1(xt−1 − xt−1) + wt−1
= ∇f |xt−1xt−1 + f(xt−1)−∇f |xt−1 xt−1 + wt−1 (3.10)
where ∇f |xt−1 is the Jacobian of f at xt−1. Then we could get the
linearized state equation
xt ≈ Fxt−1 + Ut−1 + wt−1 (3.11)
where F = ∇f |xt−1 (3.12)
Ut−1 = f(xt−1)−∇f |xt−1 xt−1
= f(xt−1)− Fxt−1 (3.13)
12
yt ≈ h(x′t) +∇h|x′t(xt − x′t) + vt
where ∇h|x′t is the Jacobian of h at x′t. Then the linearized
observation equation is transformed to the following form
yt − h(x′t) +∇h|x′t x ′ t ≈ ∇h|x′txt + vt
Let H = ∇h|x′t (3.14)
Yt = yt − h(x′t) + Hx′t (3.15)
Then the final linearized observation equation is as follows
Yt ≈ Hxt + vt (3.16)
Now applying the linearized state space model into Kalman Filter
Framework, Extended Kalman Filter is as follows :
In the predict step, the priori estimation at t is
x′t = f(xt−1) (3.17)
P′t = ∇f |xt−1Pt−1∇fT |xt−1 + Q (3.18)
In the update step, the posteriori estimation at t is
xt = x′t + Kt(yt − h(x′t)) (3.19)
Pt = (I−Kt∇h|x′t)P ′ t (3.20)
where Kalman Gain Kt is defined :
Kt = P′t∇hT |x′t(∇h|x′tP ′ t∇hT |x′t + R)−1 (3.21)
Since Extended Kalman Filter relaxes the linear assumption in
Kalman Filter, the state estimation problem for a class of
nonlinear differentiable systems could be done in the real world.
One of the most important applications is robot localization due to
the fact that the kinematic model and observation model of mobile
robot is known and differentiable, and also the elegant update form
is suitable for real time implementation. For example, [Evgeni et
al., 2002] applies Extended Kalman Filter into a lawn mover to
localize its position. Additionally, [Yibing et al., 2005] applies
Extended Kalman Filter in the Real-time freeway traffic state
estimation problem, Speed and Rotor Position Estimation of
Brushless DC Motor is designed using Extended Kalman Filter in
[Bozo et al., 2001].
However, there are two main drawbacks in Extended Kalman Filter.
Firstly, Extended Kalman Filter keeps the underlying gaussian
assumption. Hence, when the true posterior dis- tribution is not
close to a gaussian distribution, the filtering performance by
Extended Kalman Filter is heavily distorted. The other shortcoming
is the computation of Jacobian matrices in the Extended Kalman
Filter framework, all the matrices have to be calculated at each
iteration which greatly increases the computational cost.
13
Therefore, lots of attention has been paid recently to lighten its
disadvantages to improve estimated performance. The first one could
be done with Gaussian sum filter[Daniel et al., 1972] which is
designed using a gaussian mixture rather than one gaussian
distribution in Extended Kalman Filter, and the second one could be
solved by a deterministic sampling approximation methods to avoid
Jacobian matrices computation. One of the most representative
algorithms is Unscented Kalman Filter [Simon et al., 1996]. From
now on, I will introduce both of them one by one.
3.1.3 Gaussian Sum Filter
As mentioned above, Extended Kalman Filter works well when the
posterior distribution of nonlinear system is approximately
gaussian. However, when true probability distribution is
nongaussian, for example, it is multimodal, the approximation
performance by Extended Kalman Filter would be poor due to the loss
of important information in the true density. For the multimodal
distribution, Extended Kalman Filter works more like a maximum
likelihood estimator rather than a minimum variance estimator
[Daniel et al., 1972], which leads that approximated distribution
would mistakenly follow one of the peaks in the true density.
The motivation of Gaussian Sum Filter is to apply a weighted sum of
gaussian densities as a filter to approximate the unknown posterior
distribution due to the fact that any nongaussian distribution
could be approximated by a sufficient number of gaussian mixture
densities to some precise extent [Zhe, 2003].
Considering the same state space model in Extended Kalman Filter,
Gaussian Sum Filter approximates the posterior distribution using a
gaussian mixture. Hence, the gaussian sum representation could be
considered as a convex combination of the output of Extended Kal-
man Filters performing in parallel, which makes each gaussian
filter in Gaussian Sum Filter follow Extended Kalman Filter
framework. Assuming that the initial distribution is a gaussian
mixture, we could obtain the approximated posterior distribution
recursively by the following gaussian sum filter [Daniel et al.,
1972; Anderson et al., 1979; Jayesh et al., 2003].
In the predict step, we suppose p(xt−1|y0:t−1) is expressed as a
gaussian mixture
p(xt−1|y0:t−1) ∼ G∑ i=1
w(t−1)iN(x(t−1)i,P(t−1)i) (3.22)
According to equation (2.4), p(xt|y0:t−1) approaches a gaussian
sum
p(xt|y0:t−1) ∼ G∑ i=1
wtiN(x′ti,P ′ ti) (3.23)
as P(t−1)i → 0 for i=1,..., G, where x′ti,P ′ ti could be
calculated using Extended Kalman Filter
prediction framework : wti = w(t−1)i (3.24)
x′ti = f(x(t−1)i) (3.25)
P′ti = ∇f |x(t−1)i P(t−1)i∇fT |x(t−1)i
+ Q (3.26)
14
In the update step, with p(xt|y0:t−1) shown as a gaussian sum
p(xt|y0:t−1) ∼ G∑ i=1
wtiN(x′ti,P ′ ti) (3.27)
the posterior distribution could be approximated as a gaussian
mixture using equation (2.3) after receiving the observations
p(xt|y0:t) ∼ G∑ i=1
wtiN(xti,Pti) (3.28)
where xti,Pti could be calculated using Extended Kalman Filter
update framework :
xti = x′ti + Kti(yt − h(x′ti)) (3.29)
Pti = (I−Kti∇h|x′ti)P ′ ti (3.30)
Kti = P′ti∇hT |x′ti(∇h|x′tiP ′ ti∇hT |x′ti + R)−1 (3.31)
and the weight could be updated as follows [Anderson et al., 1979]
:
wti = wtiαti∑G i=1 wtiαti
(3.32)
αti = N(yt; h(x′ti),∇h|x′tiP ′ ti∇hT |x′ti + R) (3.33)
Here, a multi-equilibrium signal process in [Kazufumi et al., 2000]
is considered in order to illustrate the efficiency of Gaussian Sum
Filter for the multimodal question compared with Extended Kalman
Filter. The state space model of one-dimensional signal process is
as follows :
xt = xt−1 + 5Txt−1(1− x2t−1) + wt−1 (3.34)
yt = T (xt − 0.05)2 + vt (3.35)
where the system and observation noise are wt ∼ N(0, b2T ), vt ∼
N(0, d2T ) respectively. In this model, there are three equilibria
-1, 0 and 1 where -1 and 1 are stable equilibria. Therefore,
depending on initial state choice and stochastic noise, the system
state would be eventually distributed around -1 or 1.
Here we choose time interval T = 0.005, the ending time is 5,
initial state is x0 = 0, b = 0.5, d = 0.1. Additionally, we use two
gauss distributions to construct the gaussian sum filter, and the
initial distributions are N(−1, 2), N(1, 2). The initial weights
for these two normal distributions are both 0.5.
From the simulation, the hidden state finally reaches around the
stable state 1 and two time points t = 100T, 1000T are chosen in
figure 3.1 to illustrate how the gaussian sum filter works. The
approximated distributions at these time points are illustrated in
figure 3.2, and figure 3.3 presents the change of the weights in
gaussian sum filter.
At t = 10T , the state is around 0 in figure 3.1. At that time
point, the mixed degree of two gaussian distributions is relatively
large in figure 3.2, which leads that the weights of them are
almost half and half in figure 3.3. Moreover, due to the fact that
the state is definitely around
15
0 100 200 300 400 500 600 700 800 900 1000
−1
−0.5
0
0.5
1
1.5
Time
Figure 3.1: The system state over time
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
0
0.5
1
1.5
2
2.5
3
x
p(x(100)|y(0:100)) x(100) p(x(1000)|y(0:1000)) x(1000)
Figure 3.2: Approximated distribution at t = 100T, 1000T by
Gaussian Sum Filter
1 at t = 1000T in figure 3.1, the gaussian density that is from
initial distribution N(1, 2) plays a leading role in in figure 3.2,
whose weight is almost 1 in in figure 3.3.
Finally, in order to illustrate Gaussian sum filter performs better
than EKF for multimodal distribution, we compare the approximated
posterior density at t = 100T , 1000T , which are shown in figure
3.4 and 3.5. In both figures, the state is mistakenly estimated as
-1 by Extended Kalman Filter. On the contrary, Gaussian Sum Filter
gets the correct state estimation in both
16
0 100 200 300 400 500 600 700 800 900 1000 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time
weight of Gauss1 weight of Gauss2
Figure 3.3: The change of two gaussian distributions’ weight over
time
time points because it is the multi-Extended Kalman Filters
parallel operation through a weight sum to obtain more distribution
information.
−3 −2 −1 0 1 2 3 0
0.5
1
1.5
2
2.5
x
p(x(100)|y(0:100)) by Gaussian Sum Filter p(x(100)|y(0:100)) by
EKF
Figure 3.4: Approximated distribution at t = 100T
Gaussian Sum Filter successfully applies a gaussian mixture to
approximate the complica- ted posterior distribution. However, the
mechanism Gaussian Sum Filter is to implement the algorithm as a
number of Extended Kalman Filters, therefore, the computation of
Jacobian matrices would increase very fast, especially when the
state space is high dimensional.
17
0.5
1
1.5
2
2.5
3
x
p(x(1000)|y(0:1000)) by Gaussian Sum Filter p(x(1000)|y(0:1000)) by
EKF
Figure 3.5: Approximated distribution at t = 1000T
As mentioned before, a so called Unscented Kalman Filter provides
us a derivative-free algorithm in which the posterior distribution
could be estimated by a couple of deterministic sigma points.
3.1.4 Unscented Kalman Filter
Even though Extended Kalman filter keeps the computationally
efficient online form of Kalman Filter, it suffers serval
limitations [Simon et al., 1995; Xionga et al., 2006; Banani et
al., 2007]. The predicted mean in the Extended Kalman filter is
equal to the prior mean mapping through the nonlinear function. It
means that Extended Kalman filter does not consider the affection
of noise distribution, which leads that the filtering performance
is highly unstable. What is more, the computation of Jacobian
matrices is not trivial in most of real applications, especially in
high dimensional systems. Hence, in order to improve the accuracy
of filtering efficiently, an Unscented Kalman Filter is developed
using unscented transformation in the General Kalman Filter
framework [Simon et al., 1995, 1996, 2004; Xionga et al., 2006;
Banani et al., 2007].
Unscented Transformation is a deterministic sampling method, which
applies sigma points with fixed parameters to propagate the
information of data [Zhe, 2003; Eric et al., 2000]. Considering the
following general nonlinear equation :
y = g(x) (3.36)
the goal is to predict the mean y and covariance Py of random
variable y using the mean x and covariance Px of random variable x.
The specific procedure is as follows [Simon et al., 2004; Eric et
al., 2000] :
– Step One : to form a matrix χ of 2L+ 1 sigma vectors χi
χ0 = x (3.37)
χi = x− ( √
(L+ λ)Px)i−L, i = L+ 1, · · · , 2L (3.39)
where L is the dimension of x, λ = α2(L + κ) − L is a scaling
parameter, α is set to a small positive constant and κ is set to 0
or 3 − L generally.
– Step Two : to propagate 2L+ 1 χi through the nonlinear function
:
Yi = g(χi) (3.40)
and then, to use the new 2L+ 1 vectors Yi to calculate the mean y
and covariance Py :
y ≈ 2L∑ i=0
Py ≈ 2L∑ i=0
where the weights are given by :
W (m) 0 = λ/(L+ λ) (3.43)
W (c) 0 = λ/(L+ λ) + (1− α2 + β) (3.44)
W (m) i = W
β = 2 is optimal for gaussian distribution.
Incorporating Unscented Transformation into general Kalman Filter
framework, a derivative- free Unscented Kalman Filter is designed
with deterministic sigma points instead of Jacobian computation.
The intuition of Unscented Kalman Filter is following [Simon et
al., 1995] : With a fixed number of parameters, it should be easier
to approximate a gaussian distribution than it is to approximate an
arbitrary nonlinear function.
Let’s consider the same nonlinear system in Extended Kalman Filter
but there is no diffe- rential assumption for the nonlinear
function, the Unscented Kalman Filter is as follows [Eric et al.,
2000; Rambabu et al., 2008; Jouni et al., 2007] :
– Initialization : x0 ∼ N(x0,P0) (3.46)
(3.50)
√ (L+ λ)Pa
T ]T
x′t = 2L∑ i=0
W (m) i χxi,t|t−1 (3.53)
P′t =
x i,t|t−1 − x′t)
T (3.54)
y′t = 2L∑ i=0
W (m) i Yi,t|t−1 (3.56)
– Update :
Pyy = 2L∑ i=0
W (c) i (Yi,t|t−1 − y′t)(Yi,t|t−1 − y′t)
T (3.57)
Pxy = 2L∑ i=0
W (c) i (χxi,t|t−1 − x′t)(Yi,t|t−1 − y′t)
T (3.58)
xt = x′t + Kt(yt − y′t) (3.60)
Pt = P′t −KtPyyK T t (3.61)
Here we considering the following Univariate Nonstationary Growth
Model in [Jouni et al., 2007] as an example to compare Unscented
Kalman Filter and Extended Kalman Filter :
xt = 0.5xt−1 + 25xt−1(1 + x2t−1) −1 + 8 cos(1.2t) + wt−1
(3.62)
yt = 0.05x2t + vt (3.63)
where the system and observation noise are wt ∼ N(0, 1), vt ∼ N(0,
1) respectively, the initial state distribution for both Extended
Kalman Filter and Unscented Kalman Filter is x0 ∼ N(0, 1).
Additionally, the time interval is 0.01, the terminal time is 0.5,
and α =1, β = 2, κ=2.
The simulation results in figure 3.6 and table 3.1 show that the
estimated state by Unscented Kalman Filter is closer to the real
state than the estimated state by Extended Kalman Filter because
Unscented Kalman Filter gradually estimates the state using sigma
points with noise consideration. However, we should notice that
Unscented Kalman Filter is far way too perfect because it belongs
to general Kalman Framework which means that it keeps the gaussian
assumption for posterior distribution.
20
0 5 10 15 20 25 30 35 40 45 50 −20
−15
−10
−5
0
5
10
15
20
Time
actual state estimated state by UKF estimated state by EKF
Figure 3.6: The state estimation by UKF and EKF over time
statistics Error by UKF Error by EKF
mean -1.1881 -6.2913
variance 47.5356 108.4723
Table 3.1: Estimated error statistics
Unscented Kalman Filter effectively evaluates the posterior
distribution through the sigma points propagation without any
analytic differentiation operartion in Extended Kalman Filter. It
captures the posterior mean and covariance to third order Taylor
expansion instead of first order in Extended Kalman Filter [Eric et
al., 2000], where they have the same asymptotic complexity.
However, Unscented Kalman Filter belongs to gaussian filter which
means that it approximates the posterior distribution as a gaussian
density. When the true distribution is highly nongaussian, the
Unscented Kalman Filter would perform extremely poor.
3.1.5 Summary
Since R.E.Kalman published the distinguished paper to provide a
recursive solution for dis- crete linear filtering problem [Kalman,
1960], Kalman Filter had been widely used in different kinds of
tracking and control applications. By minimizing the mean squared
error (MSE), Kal- man Filter efficiently estimates the hidden
states of the linear and gaussian systems. However, Kalman Filter
is considered as a more theoretical result since linear and
gaussian assumption is actually too strong to apply for most of
dynamic systems in the real world.
Many extensions have been designed for nonlinear and/or nongaussian
systems during the past decades. The most famous one is Extended
Kalman Filter, which linearizes the nonlinear system to use the
classic Kalman Framework. However, Extended Kalman Filter
approximates the true probability density using a gaussian
distribution which dramatically deteriorates the
21
estimation performance. Hence Gaussian Sum Filter works as a number
of Extended Kalman Filters, which applies a weighted sum of
gaussian mixture densities as a posterior approximation for
nongaussian systems. But the computation of Jacobian matrices for
both Extended Kalman Filter and Gaussian Sum Filter are not trivial
in practice. Hence, an Unscented Kalman Filter is developed using
unscented transformation to improve the estimated accuracy
efficiently.
One of the primary advantage of this Gaussian Filter class is
computational : the update requires time polynomial in the
dimensionality of the state space [Thrun et al., 2005]. However,
this kind of filter will give rise to arbitrarily poor performance
when the target distribution is nongaussian since they keep the
underlying gaussian assumption of the true posterior distri-
bution.
Hence more and more researchers have paid attention to the strategy
which could deal with the general nonlinear and nongaussian
systems. Monte Carlo methods in the next section have played a
leading role by estimating the true posterior distribution with a
collection of random samples.
3.2 Monte Carlo Sampling Methods
All the filters mentioned previously are based on Kalman Filter
framework to estimate the hidden state in nonlinear system.
However, the underlying gaussian assumption limits their
application in the real world. In order to deal with more general
nonlinear systems which don’t put strong constraints on the
posterior behavior, Monte carlo methods have been developed to
approximate the probability distribution by producing a set of
stochastic samples. In this Chapter, we will introduce two popular
Monte Carlo methods : Importance Sampling and Markov Chain Monte
Carlo. Both of them would be used as the foundation to construct a
class of particle filters and future work.
3.2.1 Monte Carlo Principle
The core of Monte Carlo methods is to draw a set of samples
{x(i)}Ni=1 from a target density p(x) defined on a state space.
Using these samples, the target density could be approximated with
the empirical point-mass function as follows [Christophe et al.,
2003] :
p(x) = 1
δx(i)(x) (3.64)
where δx(i)(x) is the delta-dirac mass located at x(i). Moreover,
numerical integration problem could also be transformed as an
expectation calculation which could be also estimated with these
samples :
E(f) = 1
f(x(i)) −→ E(f) =
∫ f(x)p(x)dx (3.65)
when N →∞, the estimation is converge to E(f) by strong law of
large numbers.
However, in general the target distribution p(x) is not a simple
combination of standard distributions, which means that it is
almost impossible to sample from it directly. For example,
22
if the target distribution p(x) is gaussian, the samples could be
drawn easily from it. However, if it’s a highly nongaussian unknown
distribution, we could not sample from it directly. The Important
Sampling techniques are designed to deal with this difficulty. By
drawing samples from a proposal distribution, the target
distribution could be approximated by a weight sum.
3.2.2 Important Sampling
In order to improve computational efficiency, important sampling
aims to draw the samples from the important region of the target
distribution [Zhe, 2003; Christopher , 2006]. Assuming the support
of proposal distribution q(x) covers the support of target
distribution p(x), let’s reconsider the integration problem :
E(f) =
∫ f(x)p(x)dx =
∫ f(x)
p(x)
q(x) is the important weight.
Using N i.i.d. samples drawn from q(x), important sampling provides
a weight sum esti- mation of the integration :
E(f) = 1
w(x(i))f(x(i)) (3.67)
In practice, the normalized part of target distribution is unknown,
therefore, the weight im- portance is proportional to p(x)/q(x).
Generally, we normalized weight importance to obtain more practical
important sampling method.
E(f) =
∫ f(x)p(x)dx =
∫ f(x)
p(x)
E(f) =
1
N
is the normalized important weight.
Similarly, the approximated target distribution is obtained by
Monte Carlo principle :
p(x) =
w(x(i))δx(i)(x) (3.70)
Below a trivial example is considered to illustrate how important
sampling works. The target distribution is a gaussian mixture p(x)
∼ 0.5N(−3, 1) + 0.5N(1, 22), we choose an easy-sampling proposal
distribution q(x) ∼ 3.2N(1, 42). Then we draw 30 samples from
the
23
proposal distribution and calculate their corresponding weights.
The sampling results are shown in figure 3.7. The circles are the
samples we draw from the proposal distribution which just provide
the possible values from target distribution, and then we calculate
the weight of the samples (stars). The weight is actually
proportional to the probability of the possible values in the
target distribution. Hence Importance Sampling only uses the
samples from proposal distribution and their weight distribution to
successfully approximate the target distribution.
−20 −15 −10 −5 0 5 10 15 20
0
0.1
0.2
0.3
0.4
x
Weight distribution of the samples from q(x)
Figure 3.7: Estimated distribution by Important Sampling
However, there are two important issues in important sampling
[MacKay , 1998]. Firstly, the proposal distribution is very crucial
for this method. If it’s not a good approximation of target
distribution, the sampling performance would be unacceptable. For
example, we at- tempt to obtain the samples in the high probability
region of the target distribution. However, Importance Sample just
allows us sample from the high probability region of the proposal
distribution. Hence, if the high probability region of the proposal
distribution is the low proba- bility region of the target
distribution. The sampling process from proposal distribution would
take a long time with large sample set to get the samples from high
probability region in the target distribution. Secondly, in high
dimensional system, even if we obtain the samples in the high
probability region of the target distribution, their importance
weights also vary by large factors. It means that the approximation
of the target distribution just depends on a few samples. These two
are the root of the weight degeneracy problem in the typical
Particle Filter [Fred et al., 2011].
3.2.3 Markov Chain Monte Carlo
Another popular Monte Carlo method is Markov Chain Monte Carlo
(MCMC). It is a sam- pling strategy where the samples generated
from the proposal distribution construct a markov chain to explore
the target state space. Due to the fact that this markov chain is
irreducible and aperiodic, its stationary distribution exists and
is equal to the target distribution. Here,
24
we briefly introduce two typical MCMC methods : Metropolis-Hastings
algorithm and Gibbs Sampler [Walsh , 2004].
Metropolis-Hastings (MH) algorithm is one of the most practical
MCMC methods, which draws a candidate sample x∗ from proposal
distribution q(x∗|x) and accepts it with the pro- bability α(x∗, x)
= min{1, p(x∗)q(x|x∗)/(p(x)q(x∗|x))} :
1. Initialization : set i=0 and draw a start point x(0)
2. For i=0,2,...N-1 : – sample x∗ ∼ q(x∗|x(i)) – sample u ∼ U(0, 1)
– if u < α(x∗, x(i)) = min{1,
p(x∗)q(x(i)|x∗)/(p(x(i))q(x∗|x(i)))}
x(i+1) = x∗
x(i+1) = x(i)
After setting the start point x(0), we sample a candidate point x∗
from q(x∗|x(i)) given the current point x(i), then the markov chain
moves to x∗ with acceptance rate α(x∗, x(i)). This step is
illustrated by u < α(x∗, x(i)) in the algorithm where u is a
random sample from U(0, 1). If u < α(x∗, x(i)), it means that
α(x∗, x(i)) is large enough to accept the candidate.
Generally an independent proposal distribution q(x∗|x(i)) = q(x∗)
is chosen to simplify the Metropolis-Hastings algorithm. However,
the choice of proposal distribution is crucial for the sampling
performance.
For example, the target distribution is a student’s t distribution
p(x) = 1/π(1 + x2) and proposal distribution is N(0, σ2). Using
Metropolis-Hastings algorithm with 10000 samples, we got different
approximation results with different proposal distribution. In
figure 3.8 and 3.9, σ = 1 makes the samples well-mixed so that the
approximated distribution is close to the true distribution.
On the contrary, σ = 5 in figure 3.10 and 3.11 leads the samples
poor-mixed, where the approximated distribution is totally
different from the target distribution.
Additionally, in order to make the samples well-mixed, generally
the samples would be drawn to estimate the distribution after a
burn-in period. This operation could ensure the stationary has been
reached using MCMC.
The Gibbs Sampler is another MCMC algorithm. In fact, it is a
special Metropolis-Hastings algorithm with the acceptance rate α =
1 [Walsh , 2004]. The idea is to sequentially sample n variables
from n univariate conditional distributions, and then the joint
approximation distribu- tion of n variables’ samples simulates the
full joint distribution. Here is the specific algorithm :
– Initialization : x (0) 1:n
– sample x (i+1) 1 ∼ p(x1|x(i)2 , x
(i) 3 , · · · , x(i)n )
1 , x (i) 3 , · · · , x(i)n )
...
Figure 3.8: MH Sampling with q(x∗) ∼ N(0, 1)
−20 −15 −10 −5 0 5 10 15 20 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
Distribution by MCMC sampling Proposal Distribution True
Distribution
Figure 3.9: Estimated distribution by MH Sampling with q(x∗) ∼ N(0,
1)
– sample x (i+1) n ∼ p(xn|x(i+1)
1 , x (i+1) 2 , · · · , x(i+1)
n−1 )
Considering the following example in which we draw samples from a
bivariate gaussian distribution N(µ,P) using Gibbs Sampler with
2000 samples and assuming that
µ =
Figure 3.10: MH Sampling with q(x∗) ∼ N(0, 52)
−20 −15 −10 −5 0 5 10 15 20 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
Distribution by MCMC sampling Proposal Distribution Actual
Distribution
Figure 3.11: Estimated distribution by MH Sampling with q(x∗) ∼
N(0, 52)
P =
] (3.72)
Then, the conditional distribution of x1 given x2 is N(0.5x2, 1 −
(0.5)2 = 0.75), similarly, the conditional distribution of x2 given
x1 is N(0.5x1, 0.75). The figure 3.12 shows the path of Gibbs
Sampling in the first 5 iterations, and the generated 2000 samples
are shown in figure 3.13. Both the sampling results in figure 3.14
and 3.15 are shown that the estimated distributions
27
well approximate the true distributions.
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
1
2
3
4
5
x1
x2
Contour of Joint Distribution
Figure 3.12: Path traversed by Gibbs Sampler in first 5
iterations
−4 −3 −2 −1 0 1 2 3 4 −4
−3
−2
−1
0
1
2
3
4
x1
x2
Figure 3.13: The 2000 samples generated by Gibbs Sampler
MCMC is a general framework for drawing the samples from
complicated high-dimensional distributions. Metropolis-Hastings
algorithm is simple and universal but the choice of proposal
distribution is very crucial because the statistical properties of
markov chain heavily depend on it [Christophe et al., 2008]. A
popular variant of Metropolis-Hastings algorithm is Gibbs Sampler
which sequentially samples n variables from n univariate
conditional distributions.
28
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x1
Figure 3.14: Estimated distribution of x1 by Gibbs Sampler
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x2
Figure 3.15: Estimated distribution of x2 by Gibbs Sampler
However, it is still inefficient when highly dependent variables
are not generated simultaneously [Andrieu et al., 2008].
After introducing Monte Carlo methods, the question is how to apply
them in to filtering problem. I would explain it with Particle
Filter and Particle MCMC in the following sections.
29
3.3 Sequential Monte Carlo Estimation : Particle Filter
For the past decades, Monte Carlo methods have been considered as
more robust algorithms to handle the nongaussian issue using
stochastic sampling framework. For a general filtering problem
p(x0:t|y0:t) which is an arbitrary posterior distribution of state
sequence given obser-
vation sequence, if the samples x (i) 0:t (i = 1, · · · , N) could
be generated from p(x0:t|y0:t), Monte
Carlo approximations are as follows :
p(x0:t|y0:t) = 1
g(x (i) 0:t) (3.74)
However, it’s almost impossible to directly apply Monte Carlo
Sampling methods for se- quential state approximations. Hence,
Monte Carlo approximation is applied iteratively to estimate the
subtask at each time step, which is known as Sequential Monte Carlo
Sampling procedure[Jun et al., 1998; Arnaud et al., 2000; Olivier
et al., 2007]. Using Sequential Monte Carlo techniques, the true
nongaussian posterior distribution could be approximated by a class
of Particle Filter.
One of the widely used Particle Filters is Sequential Importance
Sampling Particle filter [Zhe, 2003], which is developed by
approximating the important region of posterior distribu- tion
recursively. However, there exists one serious drawback called
weight degeneracy problem in this Particle Filter - few weights
will be nonzero after several iterations, especially in high
dimensional system. It means that a great number of unimportant
particles have to be calcu- lated all the time, which leads to the
inefficient performance [Zhe, 2003; Caglar et al., 2011; Fred et
al., 2011; Samarjit, 2010]. In order to reduce weight degeneracy
problem to improve approximation results, a Resampling step has
been developed and used after Importance Sam- pling at each time
step. This produces a well-known Sampling Importance Resampling
Particle Filter [Arnaud et al., 2000; Deok-Jin, 2005]. But this
kind of Particle Filter just alleviates the degeneracy problem
rather than solving it because Resampling step introduces high
correlation between particles after a few iterations through
replicating the particles with high important weight at each time
step. The weight degeneracy is actually transformed into particle
diver- sity problem, which is one of the underlying reasons why
this Particle Filter framework has unsatisfactory estimation in
high dimensional systems [Fred et al., 2011].
In order to compensate the disadvantages of Sampling Importance
Resampling Particle Filter, a number of its variants have been
proposed recently. The first one is to choose a better proposal
distribution which could approximate the true posterior
distribution as close as possible. Extended Kalman Particle Filter
and Unscented Particle Filter use a gaussian distribution as the
proposal distribution by local linearization [Rudolph et al.,
2000]. It could make use of the current observation information to
match the posterior density better. Secondly, by inserting Markov
Chain Monte Carlo into the resampling step, MCMC particle filter
would restore the particle diversity [Walter et al., 2001; Francois
et al., 2008]. The third improvement is related to dynamic model
analysis. Through dividing the system into linear and nonlinear
parts, Rao-Blackwellized Particle Filter applies Kalman Filter to
deal with the linear part and Particle Filter to approximate the
nonlinear part in order to improve the computational
30
efficiency [Thomas et al., 2005; Robert et al., 2007].
Additionally, a KLD-Sampling Particle filter [Dieter, 2001] is
designed to adjust the number of particles efficiently according to
Kullback- Leibler divergence to ensure the approximated accuracy.
Last but not least, Another direction is to combine Particle Filter
into the MCMC framework to construct a so-called Particle MCMC
Andrieu et al. [2010], which applies the approximated distribution
by Particle Filter as the proposal distribution of MCMC. Even
though the computational complexity is relatively high, Particle
MCMC efficiently estimates the complicated posterior distribution
in high dimensional space by the trivial prior proposal
distribution in Particle Filter. Additionally, a novel particle
learning method [Carlos et al., 2010] is developed for filtering,
sequential parameter learning and smoothing in a class of general
dynamic systems. The performance is better than Particle Filter and
MCMC.
The following subsections would introduce the general Particle
Filter framework that ty- pically consists of Sequential Importance
Sampling and Resampling step to illustrate how it works.
3.3.1 Sequential Importance Sampling (SIS)
The idea of SIS is to apply importance sampling to draw N samples
with N weights from a factor of q(x0:t|y0:t) at each time step
[Niclas, 1999; Jun et al., 1998].
Let’s reconsider the expectation problem using importance sampling
view :
E[g(x0:t)] =
∫ g(x0:t)p(x0:t|y0:t)dx0:t
and define the weight :
q(x0:t|y0:t) (3.76)
Then, the expectation problem is transformed into a faction which
is only related to the weight and the proposal distribution
E[g(x0:t)] =
∫ g(x0:t)wt(x0:t)q(x0:t|y0:t)dx0:t∫
wt(x0:t)q(x0:t|y0:t)dx0:t (3.77)
Suppose now the proposal distribution could be expressed as the
following factorized form :
q(x0:t|y0:t) = q(xt|x0:t−1,y0:t)q(x0:t−1|y0:t−1) (3.78)
N samples x (i) t at time t could be drawn from q(xt|x0:t−1,y0:t),
meanwhile, the weight could
be calculated recursively :
= p(yt|xt)p(xt|xt−1) q(xt|x0:t−1,y0:t)
wt−1(x0:t−1) (3.79)
31
According to the analysis above, the Monte Carlo approximations of
state sequence are computed :
E[g(x0:t)] =
The SIS framework is as follows :
– For t = 0, 1, 2, ...T
– For i = 1, 2, ...N : sample x (i) t ∼ q(xt|x0:t−1,y0:t) and set
x
(i) 0:t = {x(i)0:t−1, x
(i) t }
– For i = 1, 2, ...N : calculate the weights by equation (3.79) and
then normalize to
wt(x (i) 0:t)
Here we use the Univariate Nonstationary Growth Model in [Jouni et
al., 2007; Olivier et al., 2007] to illustrate the weight
degeneracy problem :
xt = 0.5xt−1 + 25xt−1(1 + x2t−1) −1 + 8 cos(1.2t) + wt−1
(3.82)
yt = 0.05x2t + vt (3.83)
where the system and observation noise are wt ∼ N(0, 10), vt ∼ N(0,
1) respectively, the initial state distribution is x0 ∼ N(0, 10).
Additionally, we use N = 50 particles at each time step, time
interval is 0.1 and the terminal time is 5.
By SIS methods, the state estimation result is shown in figure
3.16. The hidden state estimation by SIS is better than Extended
Kalman Filter and Unscented Kalman Filter in figure 3.6 when the
observation is relatively inaccurate. But SIS is not good enough
because there exists the particle degeneracy problem in figure
3.17. It shows that only around 10 particles out of 50 have nonzero
weights after 50 iterations, which leads to the useless
computation.
Even though SIS makes the estimated process simpler with
sequentially sampling, it exists a very serious problem which is
known as weight degeneracy. The variance of the importance weights
increases over time so that only few weights would be nonzero after
several iterations [Arnaud et al., 2000], which means that lots of
computation waste on updating non-contributed weights.
3.3.2 Resampling Step
In order to solve weight degeneracy problem, the intuitive idea is
to reduce particles with small weights and focus on particles with
large weights. This procedure could be done in resampling step.
There are many popular resampling methods [Zhe, 2003; Arnaud et
al., 2008], here we briefly introduce Multinomial Sampling and
Systematic Resampling algorithm.
The basic idea of Multinomial Sampling is to draw N samples from
the current particles according to their normalized weights so that
the important particles with large weight could be propagated to
improve the performance.
32
0 5 10 15 20 25 30 35 40 45 50 −30
−20
−10
0
10
20
30
Time
Figure 3.16: State Estimation by SIS over time
50 weighted particles at t=49T
50 weighted particles at t=50T
Important weight update
Figure 3.17: SIS operation from 49T to 50T
Another popular resampling method is the following Systematic
Resampling which considers the weights as continuous randomly
ordered variables in the interval (0, 1), and it is a minimum
variance sampling algorithm [Zhe, 2003].
1. Set i = 1 and c1 = 0
2. For i = 2, ...N : ci = ci−1 + w (i) t
3. Set k = 1
4. Sample u1 ∼ U [0, 1/N ]
5. For j = 1, 2, ...N : – uj = u1 + (j − 1)/N – While uj > ck, k
= k + 1
– Otherwise, x (j) t = x
(k) t and w
(i) t = 1/N
Additionally, from the particle efficiency perspective, a measure
of particle degeneracy could be used as an effective sample size
[Zhe, 2003; Deok-Jin, 2005; Haug, 2005]
Neff ≈ 1∑Np
(3.84)
then if Neff is smaller than the predefined threshold NT = 0.5Np or
2Np/3, resampling step is performed using resampling methods,
otherwise, we neglect this procedure and go to the next time
instant.
3.3.3 Sampling Importance Resampling (SIR) Particle Filter
Sampling Importance Resampling Particle Filter (SIRPF) is a typical
stochastic sampling method for recursive state estimation problem,
which combines Importance Sampling and Resampling at each time step
to deal with the weight degeneracy problem in SIS.
Generally, a suitable proposal distribution is crucial for
approximation performance. The optimal proposal distribution which
minimizes the variance on the importance weights is given by
[Arnaud et al., 2000; Deok-Jin, 2005; Arnaud et al., 2008]
q(xt|x0:t−1,y0:t) = p(xt|x0:t−1,y0:t) (3.85)
However, it is infeasible to sample from this proposal
distribution. Hence, in order to simplify SIRPF framework, the
prior distribution of state sequence is chosen as the proposal
distribution
q(x0:t|y0:t) = p(x0:t) = p(x0)
which means q(xt|x0:t−1,y0:t) = p(xt|xt−1) (3.87)
Therefore, N samples x (i) t at time t could be drawn from
p(xt|xt−1), and the weight could
be updated recursively :
wt−1(x0:t−1)
= p(yt|xt)wt−1(x0:t−1) (3.88)
The generic SIR Particle Filter framework is as follows [Pierre et
al., 2006; Gordon et al., 1993; Michael et al., 2009; Caglar et
al., 2011] :
– For n = 0, 1, 2, ...T
1. For i = 1, 2, ...N : sample x (i) t ∼ p(xt|xt−1) and set x
(i) 0:t = {x(i)0:t−1, x
(i) n }
34
2. For i = 1, 2, ...N : calculate the weights according to equation
(3.88) and then
normalize to wt(x (i) 0:t)
3. resample the currentN particles according to their weights to
obtainN new particles with equal weights 1/N
Considering the same Univariate Nonstationary Growth Model in
[Jouni et al., 2007; Olivier et al., 2007], all the conditions are
the same except that a resampling step is added after Importance
step at each time. In figure 3.18. The estimation is closer to the
true hidden state than SIS method and the degeneracy problem is
done with resampling step, which is shown in figure 3.19.
0 5 10 15 20 25 30 35 40 45 50 −40
−30
−20
−10
0
10
20
30
Time
Figure 3.18: State Estimation by SIR over time
3.3.4 Discussion on SIR Particle Filters
SIR Particle Filter has been widely used in the real nonlinear and
nongaussian systems, for example, it could be used for target
tracking in wireless senor network [Huiying et al., 2010], mobile
robot navigation and localization [Niclas, 1999] and computer
vision [Samarjit, 2010].
However, there are serval drawbacks [Caglar et al., 2011; Fred et
al., 2011; Samarjit, 2010; Jinxia et al., 2010; Peter et al.,
2008]. Firstly, the resampling step effectively alleviates particle
degeneracy problem, but meanwhile it introduces the sample
diversity impoverishment because of copying the important particles
with large weights. Secondly, SIR Particle Filter lies in the
simplicity due to apply the transition probability as the proposal
distribution at each time step, however, the new information from
the observation yt is not contained in this proposal distribution,
which makes the filter sensitive for the outliers and predict
blindly, especially when the likelihood is relatively narrow.
Thirdly, Particle Filter requires relatively high computation that
is time exponential in n. n is the dimension of the state vector.
Hence the effective number
35
50 particles with equal weights at t=50T
50 particles with equal weights at t=49T
IS
RS
Figure 3.19: SIR operation from 49T to 50T, where IS represents
Importance Sampling and RS represents Resampling.
of particles is a very crucial factor for computational efficiency,
especially in high dimensional systems.
3.4 Particle Markov Chain Monte Carlo
Sequential Monte Carlo and Markov Chain Monte Carlo are the most
two popular sampling methods for state estimation, however, both of
them have unsatisfactory performance in the complicated high
dimensional dynamic systems. SIR Particle filter reduces weight
degeneracy problem via resampling, but this operation will
introduce sample diversity impoverishment. Metropolis-Hastings
algorithm is one of the best known MCMC algorithm that draws the
samples from high-dimensional distributions, but the produced
markov chain heavily depends on whether the proposal distribution
is good or not [Christophe et al., 2008].
In this context, the recent Particle Markov Chain Monte Carlo
method [Andrieu et al., 2010, 2008] provides a general framework to
incorporate Sequential Monte Carlo into MCMC, which takes advantage
of the merits in both methods in order to update the estimation
performance further in high dimensional cases.
3.4.1 Particle Metropolis-Hastings Sampler
Particle MCMC follows the idea which considers the approximation
distribution given by Particle Filter as the proposal distribution
in MCMC. It aims to approximate the high- dimensional complicated
posterior distribution using the trivial low dimensional proposal
dis- tribution by Particle Filter.
36
A standard Metropolis-Hastings algorithm for filtering is to sample
a candidate x?0:t from a proposal distribution q(x0:t|y0:t) and
accept it with the rate
α = min{1, p(x ? 0:t|y0:t)q(x0:t|y0:t)
p(x0:t|y0:t)q(x?0:t|y0:t) } (3.89)
Generally, it’s not feasible to get the optimal proposal
qopt(x0:t|y0:t) = p(x0:t|y0:t). Hence, it is natural to consider a
good approximation of true posterior as a proposal distribution in
MCMC. Particle filter algorithm provides a convenient way for
sampling using MCMC since its approximation distribution is
represented by discrete weighted particles.
By derivation from [Andrieu et al., 2010], the general Particle
Metropolis-Hastings algo- rithm is as follows :
1. i = 0, run a Particle Filter algorithm and calculate the
approximated marginal distribu- tion p(y0:t)(0), then draw a sample
x0:t(0) ∼ p(x0:t|y0:t)
2. For iteration i ≥ 1 – run a Particle Filter and calculate the
approximated marginal distribution p(y0:t)
?, then draw a sample x?0:t ∼ p(x0:t|y0:t)
– with rate
– otherwise x0:t(i) = x0:t(i− 1) (3.91)
p(y0:t)(i) = p(y0:t)(i− 1) (3.92)
This algorithm could be applied to approximate complicated
distributions with high precision while only requiring the low
dimensional proposal design in Particle Filter.
Considering the nonlinear system in [Andrieu et al., 2010] as an
example :
xt = xt−1
+ 8 cos(1.2t) + wt (3.93)
+ vt (3.94)
where the system and observation noise are assumed as gaussian
distribution wt ∼ N(0, 20), vt ∼ N(0, 20) respectively, the initial
state distribution is x0 ∼ N(0, 5). Time interval is set to 0.1 and
the terminal time is 5, then the objective is specified as
p(x0:50|y0:50). Additionally, the iteration of MCMC is 1000.
In figure 3.20, the estimated state sequence using Particle MCMC
correctly approximates the true latent state sequence with
relatively small variance even though the observation has
relatively large noise at average level. Figure 3.21 shows that the
approximated posterior dis- tribution is well-fitted for true
hidden state distribution at t = 50.
37
0 5 10 15 20 25 30 35 40 45 50 −25
−20
−15
−10
−5
0
5
10
15
20
25
Time
Figure 3.20: State estimation over time given observation
−20 −10 0 10 20 30 40 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
State
3.4.2 Discussion on Particle MCMC
Particle MCMC applies the estimated posterior distribution by
Particle filter as a proposal distribution in MCMC. It could be
more robust since it is less likely to suffer from the weight
degeneracy problem which is one of the best known drawbacks in
Particle Filter [Andrieu et al., 2010]. In fact, Particle MCMC does
not need Particle Filter to provide an accurate posterior
approximation due to the fact that the estimated density by
particle filter just returns a sample
38
in MCMC.
Lately, some extensions of Particle MCMC have been developed to
improve the performance further for general dynamic systems.
[Garethet al., 2010] incorporates an adaptive MCMC into Particle
MCMC framework so that the new method could adaptively learn the
important region of marginal distribution for unknown model
parameters in a general nonlinear dynamic system. [Chopinet al.,
2011] applies Particle MCMC as the MCMC rejuvenation steps in the
iterated batch importance sampling (IBIS) to automatically tune the
parameters with the same complexity as Particle MCMC under some
certain condition. [Whiteleyet al., 2009] designs a new Particle
MCMC for a class of multiple change-point problems. However, there
still exists some improvements in Particle MCMC [Andrieu et al.,
2010]. Firstly, other more efficient Particle Filters or Smoothers
would be a better proposal in MCMC. Then, a parallel chain or
Population-Based MCMC algorithms could be used to increase the
sample diversity.
According to this Chapter, there exists some drawbacks in the
current methods for esti- mation problem. Hence, I propose to apply
nonparametric techniques to solve these problems, and then apply
the proposed methods in the real dynamic systems.
4
Research Objectives and Approach
After pointing out the drawbacks of the existing Bayesian Filtering
algorithms, I should now explain what is my ultimate objective. My
objective aims to apply nonparametric techniques to construct a
general online estimation framework to overcome the shortcomings of
current methods so that it could be applied in the real constrained
environments with high accuracy as well as computational
efficiency.
4.1 Research Objectives
1. Improving the estimation performance of Particle MCMC
a) Applying an efficient Particle Filter or Smoother to MCMC
b) Implementing MCMCs in parallel to increase the sample
diversity
2. Designing an efficient proposal for Particle Filter using
nonparametric methods
a) Using nonparametric methods to learn the optimal sequential
proposal distribution
b) Solving weight degeneracy problem by nonparametric methods
3. Designing a filtering framework for nonparametric dynamic
system
a) Designing a filtering framework for a dynamic model based on
Gaussian Process
b) Solving the estimation problem for a dynamic model based on
Dirichlet Process
4. Applying the proposed nonparametric filtering methods into
smoothing algorithms
5. Incorporating the designed nonparametric filter framework into
nonlinear control
6. Implementing/validating our algorithms on toy examples and then
on realistic applica- tions
39
40
4.2.1 Methodologies for objective 1
Particle MCMC incorporates the approximated distribution by
Particle Filter as the pro- posal distribution in MCMC to form a
general framework to estimate the high dimensional systems. Hence,
there are two sub-perspectives to improve the performance of
Particle MCMC : one is to design an efficient Particle Filter or
Smoother for MCMC, and the other is to imple- ment MCMCs in
parallel to increase the sample diversity.
4.2.1.1 Methodologies for subobjective 1a)
One of the main reasons why the recent Particle MCMC has high
computation is that SIR particle filter fixes the number of
particle. This would lead computational waste, especially when the
true posterior distribution changes vastly over time. For example,
in the begin- ning of global localization, mobile robot is with
high uncertainty. We need a large number of particles to
approximate the posterior distribution to get the relatively
accurate position. However, once the robot knows where it is, we
actually just need a very small number of par- ticles to obtain its
position. Hence, if the number of particle is fixed, the
computation is less efficient. KLD-Sampling Particle Filter is a
good way to adjust the particle number according to
Kullback-Leibler divergence to improve the approximation quality
[Dieter, 2001]. Hence, i plan to incorporate KLD-Sampling Particle
Filter into Particle MCMC method to improve the accuracy and
efficiency of Particle MCMC.
The proposed methodology to tackle the first sub-objective is to
use KLD-Sampling Par- ticle Filter instead of SIR Particle Filter
as a proposal distribution in MCMC to improve the computational
efficiency.
4.2.1.2 Methodologies for subobjective 1b)
In the basic Particle MCMC, Particle Filter would be operated at
each iteration in MCMC. It is a kind of computational waste since
all but only one particle from Particle Filter would be abandoned
at each MCMC iteration [Andrieu et al., 2010].
The possible methodology is to apply a parallel MCMC or
Population-Based MCMC into Particle MCMC framework to take
advantage of all the particles so that the estimation per- formance
would be better because of increasing the particle diversity.
4.2.2 Methodologies for objective 2
The proposal distribution in Particle Filter is one of the most
crucial factors to determine whether its estimation is good or not.
The following methodologies aim to deal with the two subobjectives
: one is to use nonparametric methods to learn the optimal
sequential proposal distribution , the other is to solve weight
degeneracy problem by nonparametric methods.
41
4.2.2.1 Methodologies for subobjective 2a)
The optimal proposal in the weight update fashion is p(xt|xt−1,yt).
However, it is impossible to get it because it is actually unknown.
For simplicity, the typical SIR Particle Filter applies the
transition probability p(xt|xt−1) as a proposal distribution. But
it does not consider the current observation information yt. In
other words, we blindly draw the samples from p(xt|xt−1) without
adding the observation information, which would heavily affect
estimation result.
For example, the weight update of SIR Particle Filter in practice
is
wt(x0:t) = p(yt|xt)p(xt|xt−1) q(xt|xt−1,yt)
wt−1(x0:t−1)
= p(yt|xt)wt−1(x0:t−1) (4.1)
Since we do resampling at each time step, the weights at t− 1 are
the same which means that wt−1(x0:t−1) could be deleted. Then
wt(x0:t) = p(yt|xt) (4.2)
In figure 4.1, the high probability region of the proposal
distribution q(xt|xt−1,yt) = p(xt|xt−1) is the low probability
region of target distribution p(yt|xt)p(xt|xt−1). Hence the samples
we blindly draw from proposal distribution are not located at the
important region of target distri- bution, which leads the fact
that their weight distribution according to likelihood distribution
p(yt|xt) is not possible to approximate the target distribution.
From the likelihood distribu-
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Likelihood Distribution p(yt|xt)
Transition Distribution p(xt|x(t −1))
Proposal Distribution q(xt|x(t −1), yt) = p(xt|x(t −1)) Target
Distribution =p(yt|xt)p(xt|x(t−1))
Samples from Proposal Distribution
Weight Distribution
Figure 4.1: Important Sampling at one time step in Particle
Filter
tion, we could see the variance is relatively small which means
that the current observation accurately describes the current
state. But proposal distribution does not consider it. This is the
underlying reason for poor estimation performance.
Hence, in order to add the current observation information to
improve the approximation, the methodology for the first subgoal is
to use nonparametric methods to learn the optimal
42
proposal distribution. In [Jonathan et al., 2009; Jonathan , 2011;
Marc et al., 2009], the gaussian process is used to learn the state
space model for filtering problem. In fact, they apply gaussian
process to learn the transition and likelihood distribution.
According to what they did, the proposed methodology expects that a
gaussian process could learn the proposal distribution. Since the
optimal proposal distribution at one time actually reflects the
nonlinear relationship between [xt−1,yt] and xt, learning this
distribution is actually a general gaussian process regression
problem. Furthermore, if we treat this as a regression problem, we
could also consider other regression methods in which the
prediction is a probability form .
4.2.2.2 Methodologies for subobjective 2b)
Even though the subobjective 2a) would improve the estimation
performance by adding current observation information, the modified
Particle Filter still have the weight degeneracy problem. The
resampling step hides this problem by replicating the particles
with large weight, but it will bring the particle diversity loss
problem. Additionally, resampling step will also lead
time-consuming problem.
Hence, the motivation is how to solve the weight degeneracy problem
without resampling step. Looking back to the original importance
sampling procedure, we find
wt(xt) ∝ p(yt|xt)p(xt|y1:t−1)
q(xt|y1:t) (4.3)
If the weight does not update in a sequential fashion and just use
the above form, the weight degeneracy problem could be done [Jan,
2011]. Hence the methodology for the second su- bobjective is to
approximate the total optimal proposal q(xt|y1:t) and prediction
distribution p(xt|y1:t−1) by gaussian process regression or other
nonparametric methods.
4.2.3 Methodologies for objective 3
Generally a filtering problem assumes that the dynamic state space
model has been already known, however, it is not the case in the
real world. Hence the goal is to develop a general filtering
framework for the unknown dynamic model. The first thing is to
learn the unknown dynamic model from the training data set. Since
the traditional parametric models use a finite number of
parameters, it would suffer from over-fitting or under-fitting of
data when the model complexity and the amount of data do not match.
Hence, more and more attention has been focused on nonparametric
model. The representative two are gaussian process and dirichlet
process. Both of them could be applied to establish the Hidden
Markov Model, which allows us to incorporate filtering techniques
to implement the general estimation.
4.2.3.1 Methodologies for subobjective 3a)
[Jonathan et al., 2009; Jonathan , 2011] uses gaussian process to
learn the state space model and then applies it with different
kinds of filtering algorithm. The limitation is that they use the
ground true state in the training set, which may not be possible in
the real applications. Therefore, they extended their gaussian
process Bayesian filter with modified Gaussian Processes Latent
Variable Model [Jonathan et al., 2009, 2011; Lawrence, 2005]
which
43
is original from Gaussian Processes Dynamical Model[Jacket al.,
2008]. Hence the methodology is to learn optimal proposal for
Particle Filter using Gaussian Processes Dynamical Model.
4.2.3.2 Methodologies for subobjective 3b)
The Dirichlet process is a stochastic process which could be
applied as a bayesian nonpara- metric model for training data,
especially in the Dirichlet process mixture models. [Caron et al.,
2008] uses a Dirichlet process mixture to model the unknown noise
distribution in the linear and gaussian dynamic system, and then
develops an off-line MCMC and an online particle filter for
estimation problem. Additionally, the hierarchical Dirichlet
process hidden Markov model (HDP-HMM) and its variants [Beal et
al., 2002; Yee et al., 2010; Emily, 2009; Emily et al., 2007] have
been established for the past decades, which provide us a good way
to design the general filtering framework with bayesian
nonparametric models. [Emily et al., 2007] uses HDP-HMM in a linear
gaussian dynamic model with unknown control input, then Kalman
filter is applied for estimation problem. However, filtering for
both models mentioned before is just for linear and gaussian
system. Hence the methodology is to apply Dirichlet process mixture
or HDP-HMM into the general nonlinear dynamic model, and then
combine different Monte Carlo methods (MCMC, Sequential Monte
Carlo) to solve the estimation problem.
4.2.4 Methodologies for objective 4
As mentioned before, if the state space model is unknown, we should
learn it from the training set. In fact, learning requires more
smoothing rather than filtering because smoo- thing provides better
estimation of the state in the process [Stuart et al., 2010; Carlos
et al., 2010]. Hence the idea is to apply the previous proposed
nonparametric filtering framework into smoothing algorithms for a
general dynamic system in order to improve the approximation
efficiency.
4.2.5 Methodologies for objective 5
Control task is actually the final goal for a general dynamic
system after learning the model and estimating the state [Nadine et
al., 2007]. The recent articles [Villiers et al., 2011; Stahl et
al., 2011] successfully combine particle filter into a nonlinear
model predictive controller design in order to increase the control
robustness.
Hence the methodology is to use the designed nonparametric filter
framework into nonlinear predict model control to improve the
control performance. Additionally, filtering technique is actually
a state estimation method, if we could use the estimated state as a
feedback which could be used as input of nonlinear controllers
design, the close loop control system could have good transitional
and steady performance.
4.2.6 Methodologies for objective 6
Filtering technique could be applied in a great number of real
applications, such as visual- based human motion tracking,
financial analysis, weather broadcasting and so on. Hence, the
methodology is to implement the algorithms on toy examples and then
on realistic applications.
44
One of the most important real applications is autonomous robot
localization and navigation, which would be considered as my main
implementation plan.
4.2.7 Work Package
Here are three main work packages : formal theory studies, methods
evaluation and com- plexity analysis, practical application
implementation.
Formal theory studies : the existing filtering techniques
unsatisfactorily performs when the posterior distribution of a
dynamic system is complicated and nongaussian. The underlying
reason is that there are some drawbacks in their theoretical design
frameworks. Hence, it is necessary to study the formal theory in
depth, which is the foundation of my research work.
1. Further study on the Filtering and smoothing Theory to broaden
the design framework for a general dynamic system : • Further study
on Particle MCMC and KLD-Sampling Particle Filter to find the
combi-
nation manner : [Stuart et al., 2010; Zhe, 2003; Arnaud et al.,
2000; Olivier et al., 2007; Deok-Jin, 2005; Arnaud et al., 2008;
Haug, 2005; Andrieu et al., 2010, 2008; Dieter, 2001; Thrun et al.,
2005; Carlos et al., 2010; Garethet al., 2010; Chopinet al., 2011;
Whiteleyet al., 2009] • Review the mathematical derivation of
existing filtering and their own drawbacks :
[Kalman, 1960; Maria, 2004; Daniel et al., 1972; Zhe, 2003; Simon
et al., 2004; Jun et al., 1998; Arnaud et al., 2000; Olivier et
al., 2007; Fred et al., 2011; Jinxia et al., 2010; Peter et al.,
2008; Carlos et al., 2010]. • Finish the study of other novel
filtering methods [Choo et al., 2001; Guo et al., 2006;
Andreasen , 2008; Friston, 2008]. This will provide many other
creative frameworks which could be used in the proposed methods
since the design generally needs the combination of different
filtering algorithms. • Kernel Adaptive Filtering : [Weifeng et
al., 2010]. It aims to get a general idea how the
nonparametric kernel methods could be applied in the filtering
problem. • Smoothing techniques : [Stuart et al., 2010; Heijden et
al., 2004; Carlos et al., 2010].
2. Further study on the Nonpara