MCMC: Particle Theory By Marc Sobel. Particle Theory: Can we understand it?

MCMC: Particle TheoryMCMC: Particle Theory

By Marc SobelBy Marc Sobel

Particle Theory: Can we understand Particle Theory: Can we understand it?it?

The Dynamic Model Setup: Heavy The Dynamic Model Setup: Heavy TheoryTheory

1:

t

t

t

( , , ) ; t 1,...;

W errors;

are unobserved parameters independent of the X's.

X are unobserved parameters

generated independently or by a markov

process. X :

t t tY h t X W

independent

t

t=1,... is called the signal process.

Y are observations.

The function 'h' is typically unknown

Particle Filters: Lighter TheoryParticle Filters: Lighter Theory

More familiar dynamic model setting:More familiar dynamic model setting:

Even more familiar: (for unknown g,h)Even more familiar: (for unknown g,h)

1

( | , );

( | , );t t t

t t t

Y q X

X f X

1

( ) ;

( ) ;t t t

t t t

Y g X

X h X e

Particle Filter: GoalParticle Filter: Goal

The Goal in particle filter theory is to simulate (all or part The Goal in particle filter theory is to simulate (all or part of) the (unobserved) posterior distribution of the signal of) the (unobserved) posterior distribution of the signal process {Xprocess {Xtt: t=1,…}=X: t=1,…}=X1:t1:t

(i.e., (X(i.e., (X1:t1:t |Y |Y1:t1:t) ) (as well as additional parameters, if ) ) (as well as additional parameters, if necessary). Specifically, we would like to simulate the necessary). Specifically, we would like to simulate the signal process at time t (i.e., the posterior distribution of signal process at time t (i.e., the posterior distribution of XX1:t1:t. The particles referred to above take the form, . The particles referred to above take the form,

These particles are designed to approximate the These particles are designed to approximate the posterior distribution, (Xposterior distribution, (X1:t1:t |Y |Y1:t1:t) through the introduction ) through the introduction of appropriate weights (see next). of appropriate weights (see next).

(1) (2) ( )1: 1: 1:, ,..., kt t tX X XP

Particle Filter: WeightsParticle Filter: Weights

The (normalized) weights WThe (normalized) weights W(1)(1),…,W,…,W(k)(k) attached to particles attached to particles are designed so that the posterior probabilityare designed so that the posterior probability

at a particle Xat a particle X(i)(i) is given by the weight W is given by the weight W(i) (i) (i=1,…,k). (i=1,…,k). Thus, for example, Thus, for example,

Since weights are always assumed to be normalized, we Since weights are always assumed to be normalized, we need only specify them up to a constant of proportionality.need only specify them up to a constant of proportionality.

( )1:1: |ittX Y

( )

( )1:

; 1,...,1,...,

( ; 1,..., )l

jj

lj j t

X a j tl k

P X a j k W

Convergence Convergence

The goal of convergence is that the estimated The goal of convergence is that the estimated posterior distribution of xposterior distribution of xnn’s given observations Y’s given observations Y1:n1:n

corresponds to what it should be. Typically the corresponds to what it should be. Typically the posterior distribution is defined via weighting wposterior distribution is defined via weighting w tt[k].[k].

This induces, using ‘K’ as a kernel, the density,This induces, using ‘K’ as a kernel, the density,

0: 1( ) ( | )ˆt t tA P X A y

( ) '( )ˆ

'

kt h tt

kt

t h tk

w K X x X xx

K X x X x

convergenceconvergence

We’d like the resulting measures to converge,We’d like the resulting measures to converge,

0; or ˆ ˆ

ˆ ˆ log log 0ˆ ˆ

ˆ ˆ

t

tt

t

Particle Filters: BootstrapParticle Filters: Bootstrap

Assume that the (Y,X)’s are independent of each Assume that the (Y,X)’s are independent of each other. Then we can build particle filters other. Then we can build particle filters sequentially by defining the weights by:sequentially by defining the weights by:

for particles formed by:for particles formed by: But, by normalization the term drops out. But, by normalization the term drops out.

So, we can define the weights on the new XSo, we can define the weights on the new X tt’s by ’s by

( ) ( ) ( )1: 1: 1 1: 1* | * | ( )k k k

t t t t tt t tW W X Y W Y X X

( ) ( ) ( )1: 1: 1,k k k

tt tX X X( )

1: 1ktW

( ) | | ( )kt t t t t tW X Y Y X X

Particles PSLAM (continued)Particles PSLAM (continued)

To get the bootstrap particle filter, we assume that To get the bootstrap particle filter, we assume that the prior distribution of Xthe prior distribution of Xt t can be based on can be based on

simulating X’s from the prior distribution. This simulating X’s from the prior distribution. This leaves us to define the weights by:leaves us to define the weights by:

for Xfor Xtt’s selected from the prior. We might then cull ’s selected from the prior. We might then cull

the particles, choosing only those with reasonably the particles, choosing only those with reasonably large weights.large weights.

( ) ( ) ( )| |l l lt t t t tW X Y Y X

Particle Selection: ElementaryParticle Selection: Elementary

Shephard and Pitt (in 1994) proposed dividing up Shephard and Pitt (in 1994) proposed dividing up the particles into homogeneous groups (i.e., the particles into homogeneous groups (i.e., based on the values of functions of particles), and based on the values of functions of particles), and resampling from each group separately. This has resampling from each group separately. This has the advantage that we aren’t working with the advantage that we aren’t working with collections of particles which combine apples and collections of particles which combine apples and oranges.oranges.

( ),( )

,

|;

|

lt t kl

tt t k

Y XW

Y X

A Probabilistic ApproachA Probabilistic Approach

The following algorithms take a probabilistic The following algorithms take a probabilistic approachapproach

tu

tz

m

tx

uzmx

t

t

t

ttt

to1 timefrom imputs Control

to1 timefrom inputsSensor

tenvironmen theof Map

at timerobot theof State

),|,p(

:1

:1

:1:1

Full vs. Online SLAM: Full Slam is an example of a Full vs. Online SLAM: Full Slam is an example of a particle filter which is not a bootstrap filter.particle filter which is not a bootstrap filter.

Full SLAM calculates the robot state over all Full SLAM calculates the robot state over all time up to time ttime up to time t

),|,p( :1:1:1 ttt uzmx

Online SLAM calculates the robot state for the Online SLAM calculates the robot state for the current time tcurrent time t

121:1:1:1:1:1 ...),|,(),|,p( ttttttt dxdxdxuzmxpuzmx

WeightsWeights

We’ve already said that weights reflect the We’ve already said that weights reflect the posterior probability of a particle. In posterior probability of a particle. In Sequential MCMC we construct these Sequential MCMC we construct these weights sequentially. In bootstrap particle weights sequentially. In bootstrap particle analysis we simulate using the prior, but analysis we simulate using the prior, but sometimes this doesn’t make sense. (see sometimes this doesn’t make sense. (see examples later). One mechanism is to use examples later). One mechanism is to use another distribution (i.e., less prone to another distribution (i.e., less prone to outliers) to generate particle extensions.outliers) to generate particle extensions.

Weights (continued)Weights (continued)

Liu (2001) recommends using a target distribution Liu (2001) recommends using a target distribution q to reconstruct the current posterior. Use ‘q’ to q to reconstruct the current posterior. Use ‘q’ to simulate the X’s. The weights then take on the simulate the X’s. The weights then take on the form,form,

1: 1

1: 1: 11: 1

( | ) ( | )

( | )t t t t

n nt t

X X f Y XW W

q X X

Kernel Density Estimate-based Kernel Density Estimate-based Particle FiltersParticle Filters

Use a kernel (gaussian or otherwise) as the Use a kernel (gaussian or otherwise) as the proposal density q. This puts likelihood weights on proposal density q. This puts likelihood weights on the points selected by the prior:the points selected by the prior:

( ) 11: 1: 1:1:

1

( )1: 1:

1

( | ) ( ) ' ( )ˆ ( )

( | )

k lt t tt

lk l

t tl

L y x K x x h x xf x

L y x

Doucet Particle FiltersDoucet Particle Filters

Doucet recommends using the kernel K as a Doucet recommends using the kernel K as a proposal density. His weighting becomes:proposal density. His weighting becomes:

1: 11: 1: 1

1: 1

( | ) ( | )

( | )t t t t

n nt t

X X f Y XW W

K X X

Effective Sample SizeEffective Sample Size

The effective sample size of a weighted distribution is the The effective sample size of a weighted distribution is the effective number of unique particles.effective number of unique particles.

When it gets too small, we devise a threshold; A particle When it gets too small, we devise a threshold; A particle survives if:survives if:

A) it’s weight is above the threshold, orA) it’s weight is above the threshold, or B) it surves with probability (w/thresh). B) it surves with probability (w/thresh). All rejected weights are started from time t=0. All rejected weights are started from time t=0.

2( )

1

k it

i

kESS

W

Culling or Sampling ParticlesCulling or Sampling Particles

We can see that for non-bootstrap particle filters, particles We can see that for non-bootstrap particle filters, particles tend to multiply beyond all restriction if no culling is used. tend to multiply beyond all restriction if no culling is used. Culling (or resampling) removes some (unlikely) particles Culling (or resampling) removes some (unlikely) particles so they don’t multiply too rapidly. so they don’t multiply too rapidly.

A) A) residual resamplingresidual resampling: At step t-1, extend particles : At step t-1, extend particles via with reconstruct weights for via with reconstruct weights for the new particles, retain the new particles, retain

copies of the particle . Draw copies of the particle . Draw

mm00=k-m=k-m(1)(1)-…-m-…-m(k)(k) from the particle stream. Residual from the particle stream. Residual

resampling has the effect of killing particles with small resampling has the effect of killing particles with small weights and emphasizing particles with large weights.weights and emphasizing particles with large weights.

( ) ( )1: (j=1,...,k)j jtm kw

( )1: 1jtx ( ) ( ) *

1: 1: 1,j jtt tx x x

* ( )1:1: 1| ,j

t ttx x y( )1:jtx

Cull Particles (continued)Cull Particles (continued)

Thus, a particle with weight .01 in a stream of size 50, is Thus, a particle with weight .01 in a stream of size 50, is killed.killed.

Project: ShouldProject: Should you first ‘extend’ particles and then resample or vice you first ‘extend’ particles and then resample or vice

versa?versa? B) B) Simple ResamplingSimple Resampling: Extend particles as above. : Extend particles as above.

Sample particles from the full stream according to the new Sample particles from the full stream according to the new weights. weights.

This has the effect of ‘keeping’ particles with low weights.This has the effect of ‘keeping’ particles with low weights. Thus, a particle with weight .01 in a stream of size 50, has Thus, a particle with weight .01 in a stream of size 50, has

a .01 chance of being selected (whereas for residual a .01 chance of being selected (whereas for residual resampling) it has a chance of 0. resampling) it has a chance of 0.

Resampling (continued)Resampling (continued)

C) C) General ResamplingGeneral Resampling: Define ‘rescaled’ probability : Define ‘rescaled’ probability weights: (aweights: (a(1)(1),…,a,…,a(k)(k)) (usually related to the w) (usually related to the wtt weights). Choose particles based on these weights weights). Choose particles based on these weights and assign the ‘new weights’ wand assign the ‘new weights’ wtt

(*,j)(*,j)= (w= (wtt(j)(j)/a/a(j)(j)) to the ) to the

corresponding particles. Rescale the new weights. corresponding particles. Rescale the new weights.

D) D) Effective Sample size together with General Effective Sample size together with General Sampling: Sampling: Sample until the effective sample size is Sample until the effective sample size is below a threshold. Accept particles whose weights are below a threshold. Accept particles whose weights are bigger than an appropriate weight threshold c (i.e., bigger than an appropriate weight threshold c (i.e., weight median). Accept particles whose weights w weight median). Accept particles whose weights w are below the threshold c with probability (w/c) and are below the threshold c with probability (w/c) and otherwise reject them. otherwise reject them.

General ResamplingGeneral Resampling: We can generalize residual : We can generalize residual sampling by choosing only large weights to resample sampling by choosing only large weights to resample from, and then resample with scaled weights. Suppose from, and then resample with scaled weights. Suppose some particles have different numbers of some particles have different numbers of continuations than others. In this case we might want continuations than others. In this case we might want to select them with weighted probabilities that take to select them with weighted probabilities that take this into account – i.e., make the a’s larger for more this into account – i.e., make the a’s larger for more continuations and smaller for fewer continuations. If continuations and smaller for fewer continuations. If there are e.g., more continuations, once these have there are e.g., more continuations, once these have been implemented, we reweight by been implemented, we reweight by

w*=(w/a). w*=(w/a).

The Central Problem with Particle The Central Problem with Particle FiltersFilters

At each time stage ‘t’ we are trying to reconstruct the At each time stage ‘t’ we are trying to reconstruct the posterior distribution and we biuld posterior distribution and we biuld future particles on this estimate. Mistakes at each stage future particles on this estimate. Mistakes at each stage amplify mistakes later on. Posterior distribution is typically amplify mistakes later on. Posterior distribution is typically done using weights. For this reason, there are many done using weights. For this reason, there are many algorithms supporting ways to improve the posterior algorithms supporting ways to improve the posterior distribution reconstruction:distribution reconstruction:

A) Use kernel density estimators to reconstruct the A) Use kernel density estimators to reconstruct the posterior.posterior.

B) Use Metropolis Hastings to reconstruct the posterior.B) Use Metropolis Hastings to reconstruct the posterior. C) Divide particles into separate streams, reconstructing the C) Divide particles into separate streams, reconstructing the

posterior for each one separately. posterior for each one separately.

( )1: 11: 1( | , )jttx y

Pitt – Shephard Particle TheoryPitt – Shephard Particle Theory

Suppose new X’s which are sampled aposteriori Suppose new X’s which are sampled aposteriori are heterogeneous i.e., they can be divided into are heterogeneous i.e., they can be divided into homogeneous streams. Call the streams s=1,…,k. homogeneous streams. Call the streams s=1,…,k. Define, for each stream s, with ‘Z Define, for each stream s, with ‘Z ii=s’ denoting that =s’ denoting that

XXi i is in the s’th stream:is in the s’th stream:

We then divide sampling into two parts. First we We then divide sampling into two parts. First we sample a stream usingsample a stream using

, ,1

( | ) | ; i=1,....,n ; s=1,...,ki

i s s i s i sZ ss

f Y L Y Xn

,( | ) ; s=1,...,ks i s sf Y

Pitt ShephardPitt Shephard

Then, we sample using weights:Then, we sample using weights:

Suppose we have very heterogeneous particles, Suppose we have very heterogeneous particles, and there is a way to divide them into and there is a way to divide them into homogeneous streams. Then this makes sense.homogeneous streams. Then this makes sense.

(e.g., mixture kalman filter type examples). (e.g., mixture kalman filter type examples).

, ,( )

,

( | )

( | )i s i ss

ii s s

f Y Pw

f Y

Example of a Linear Dynamic ModelExample of a Linear Dynamic Model

In what follows we consider the following linear In what follows we consider the following linear dynamic model:dynamic model:

We compute We compute The Kalman Filter gives us a solution to the The Kalman Filter gives us a solution to the

posterior distribution. We can check this against posterior distribution. We can check this against the bootstrap and other filters.the bootstrap and other filters.

1

2 2

.5 5 ;

2 ; 9; 4;

t t t

t t t v w

X X V

Y X W

2( | )t tE X Y

Comparing Bootstrap Particles with the real Comparing Bootstrap Particles with the real thing computing thing computing ΛΛt t over 500 time periods.over 500 time periods.

Histogram of the difference between bootstrap Histogram of the difference between bootstrap particle and real parametersparticle and real parameters

A More Complicated Example of a A More Complicated Example of a Dynamic ModelDynamic Model

In what follows we consider the following quadratic In what follows we consider the following quadratic dynamic models:dynamic models:

The Kalman Filter again gives us stepwise The Kalman Filter again gives us stepwise solutions to the posterior distribution. We can solutions to the posterior distribution. We can check this against the bootstrap and other filters.check this against the bootstrap and other filters.

21 1

2 2

1 2

.5 5 ;

2 ; 9; 4;

.005; .01;

t t i t t

t t t v w

X X X V

Y X W

Quadratic Model for Quadratic Model for κκ11=.005. Bootstrap versus =.005. Bootstrap versus

real parameters.real parameters.

Absolute difference between the bootstrap Absolute difference between the bootstrap particle filter and the real parameter for particle filter and the real parameter for

κκ11=.005=.005

Switch to Harder ModelSwitch to Harder Model

For the second quadratic model when For the second quadratic model when kappa=.01, the bootstrap particle filter kappa=.01, the bootstrap particle filter breaks down entirely before time t=50.breaks down entirely before time t=50.

We switch to residual resampling. We switch to residual resampling.

Residual Resamling versus real Residual Resamling versus real parameters when parameters when κκ22=.01.=.01.

Histogram: Note how much larger Histogram: Note how much larger differences are in this casedifferences are in this case

Switch to General SamplingSwitch to General Sampling

Now we use residual sampling as long as Now we use residual sampling as long as the effective sample size stays above 500.the effective sample size stays above 500.

When it falls below 500, we use only those When it falls below 500, we use only those particles:particles:

A) with weights above .002, orA) with weights above .002, or B) with probability (w/.002) we accept the B) with probability (w/.002) we accept the

particles. particles.

General Sampling versus real General Sampling versus real parameters for the quadratic model: parameters for the quadratic model:

Coefficient=.01.Coefficient=.01.

Histogram of the differences between general Histogram of the differences between general

sampling and real parameterssampling and real parameters: : Note how small Note how small the differences are.the differences are.

Mixture Kalman Filters: Tracking Mixture Kalman Filters: Tracking ModelsModels

Define models which have a switching mechanism Define models which have a switching mechanism P(KFP(KFii|KF|KFjj) in going from Kalman filter KF) in going from Kalman filter KF jj to KF to KFii

(i,j=1,…,d). Updates use standard Kalman Filters (i,j=1,…,d). Updates use standard Kalman Filters for a given model KFfor a given model KF jj and then with probability and then with probability

P(KFP(KFii|KF|KFjj) switch to filter KF) switch to filter KF ii. .

1

;; i=1,...,kt i t i i t

it i t i i t

Y X ZKF

X X W

Kalman FiltersKalman Filters

We have the updating formula:We have the updating formula:

Which can generate particles consisting of Which can generate particles consisting of Kalman Filters. Kalman Filters.

We can use Pitt-Shepard to sample these We can use Pitt-Shepard to sample these models. In effect, we divide the Kalman Filters models. In effect, we divide the Kalman Filters into homogeneous groups, and then sample into homogeneous groups, and then sample them as separate streams.them as separate streams.

1| ( ) ( | , ) ( | , )t t t t t t t t t tP Y I r P I r P X X I r P Y X I r dX

MKF’sMKF’s

Mixture Kalman Filters are useful for Mixture Kalman Filters are useful for tracking. The Kalman Filters represent tracking. The Kalman Filters represent different objects at a particular location. The different objects at a particular location. The probabilities represent the chance that one probabilities represent the chance that one object versus others are actually visible object versus others are actually visible subject to the usual occlusion. The real subject to the usual occlusion. The real posterior distribution for MKF’s is easy to posterior distribution for MKF’s is easy to calculate. calculate.

Matlab Code: Part IMatlab Code: Part I

% Particle Model % X[t]=.5X[t-1]+3*V[t] % Y[t]=2*X[t]+2*W[t] XP=zeros(100,1000); X(1,1:1000)=5+sqrt(3)*normrnd(0,1,1,1000); Y(1,1:1000)=2*X(1,1:1000)+sqrt(2)*normrnd(0,1,1,1000); W=normpdf(Y(1,:),2*X(1,:),ones(1,1000)); WW=W/sum(W'); XBP(1,:)=randsample(X(1,:),1000,true,WW); mm=((2*Y(1,:)/4)+(5/9))/(1+(1/9)); sg=sqrt(1/(1+(1/9))); XP(1,:)=normrnd(mm,sg*ones(1,1000));

Matlab Code: Part IIMatlab Code: Part II

for tim=2:100 for jj=1:1000 X(tim,jj)=.5*X(tim-1,jj)+5+3*randn; Y(tim,jj)=2*X(tim,jj)+2*randn; end W=normpdf(Y(tim,:),2*X(tim,:),ones(1,1000)); WW=W/sum(W'); XBP(tim,:)=randsample(X(tim,:),1000,true,WW); mm=((2*Y(tim,:)/4)+(5/9)+.5*X(tim,:)/9)/(1+(1/9)); sg=sqrt(1/(1+(1/9))); XP(tim,:)=normrnd(mm,sg*ones(1,1000)); end

Documents

MCMC: Particle Theory By Marc Sobel. Particle Theory: Can we understand it?