Author
thomasina-wood
View
212
Download
0
Embed Size (px)
Queensland University of Technology
CRICOS No. 000213J
Towards Likelihood Free Inference
Tony PettittQUT, Brisbane
Joint work with Rob Reeves
CRICOS No. 000213Ja university for the worldrealR
Outline
1. Some problems with intractable likelihoods.
2. Monte Carlo methods and Inference.
3. Normalizing constant/partition function.
4. Likelihood free Markov chain Monte Carlo.
5. Approximating Hierarchical model
6. Indirect Inference and likelihood free MCMC
7. Conclusions.
CRICOS No. 000213Ja university for the worldrealR
Stochastic models (Riley et al, 2003)
Macroparasite within a host.Juvenile worm grows to adulthood in a cat.Host fights back with immunity.Number of Juveniles, Adults and amount of Immunity (all integer).
evolve through time according to Markov process unknown parameters, eg
Juvenile → Adult rate of maturationImmunity changes with timeJuveniles die due to Immunity
Moment closure approximations for distribution of limited to restricted parameter values.
( ), ( ), ( )J t A t I t
( ( ), ( ), ( ))J t A t I t
CRICOS No. 000213Ja university for the worldrealR
Numerical computation of limited by small maximum values of J, A, I.
Can simulate process easily.
Data: J at t=0 and A at t (sacrifice of cat), replicated with several cats
( , , )pr J A I
Source: Riley et al, 2003.
( ( ), ( ), ( ))J t A t I t
CRICOS No. 000213Ja university for the worldrealR
Other stochastic process models include
spatial stochastic expansion of species
(Hamilton et al, 2005; Estoup et al, 2004)
birth-death-mutation process for estimating transmission rate from TB genotyping
(Tanaka et al, 2006)
population genetic models, eg coalescent models
(Marjoram et al 2003)
Likelihood free Bayesian MCMC methods are often employed with quite precise priors.
CRICOS No. 000213Ja university for the worldrealR
Normalizing constant/partition function problem.
The algebraic form of the distribution for y is known but it is not normalized, eg Ising model
For means neighbours (on a lattice, say). The normalizing constant involves in general a sum over terms.
Write
0 1~
( | ) exp ( )i i ji i j
p y y Ind y y
{ 1,1} 1, , and ~iy i n i j
( ; ) known( | )
( ) unknown
f yp y
z
2n
CRICOS No. 000213Ja university for the worldrealR
N-S and E-W neighbourhood
CRICOS No. 000213Ja university for the worldrealR
Outline
1. Some problems with intractable likelihoods.
2. Monte Carlo methods and Inference.
3. Normalizing constant/partition function.
4. Likelihood free Markov chain Monte Carlo.
5. Approximating Hierarchical model
6. Indirect Inference and likelihood free MCMC
7. Conclusions.
CRICOS No. 000213Ja university for the worldrealR
Monte Carlo methods and Inference.
Intractable likelihood, instead use easily simulated values of y.
Simulated method of moments (McFadden, 1989).
Method of estimation: comparing theoretical moments or frequencies with observed moments or frequencies.
Can be implemented using a chi-squared goodness-fit-statistic, eg Riley et al, 2003. Data: number of adult worms in cat at sacrifice.
CRICOS No. 000213Ja university for the worldrealR
Source: Riley et al 2003.
Plot of goodness-of-fit statistic versus parameter. Greedy Monte Carlo. Precision of estimate?
CRICOS No. 000213Ja university for the worldrealR
Outline
1. Some problems with intractable likelihoods.
2. Monte Carlo methods and Inference.
3. Normalizing constant/partition function.
4. Likelihood free Markov chain Monte Carlo.
5. Approximating Hierarchical model
6. Indirect Inference and likelihood free MCMC
7. Conclusions.
CRICOS No. 000213Ja university for the worldrealR
3. Normalizing constant/partition function and MCMC
(half-way to likelihood free inference)
Here we assume (Møller, Pettitt, Reeves and Berthelsen, 2006)
Key idea Importance sample estimate of given by
Sample
.
( ; ) known( | )
( ) unknown
z( ) ( ; ) and difficult to find.
f yp y
z
f y dy
( )
( )
z
z
~ ( | ) theny p y ( ; ) z( )
unbiased estimate of ( ; ) z( )
f y
f y
CRICOS No. 000213Ja university for the worldrealR
Used off-line to estimate then carry out standard Metropolis-
Hastings with interpolation over a grid of values.( eg Green
and Richardson, 2002, in a Potts model).
Standard Metropolis Hastings: Simulating from target distribution
Acceptance ratio for changing
accepted with probability .
Key Question: Can be calculated on-line or avoided?
( ) / ( )z z ( )
( )
z
z
( | ) ( | ) ( ).p y p y p
, proposal ( | )q ( | ) ( ) ( | )
( | ) ( | ) ( ) ( | )
( ; ) ( ) ( | ) ( )
( ; ) ( ) ( | ) ( )
p y p qA
p y p q
f y p q z
f y p q z
min{1, ( | )}A ( )
( ')
z
z
CRICOS No. 000213Ja university for the worldrealR
On-line algorithm – single auxiliary variable method.
Introduce auxiliary variable x on same space as y and extend target distribution for the MCMC
Key Question: How to choose distribution of x so that
removed from
Now acceptance ratio is as a new pair proposed.
Proposal becomes .
Assume the factorisation
Choose the proposal so that
Then algebra → cancellation of and
does not depend on
( , | ) ( | , ) ( | ) ( ).p x y p x y p y p ( )z
( | ).A ( , | , )A x x ( , )x
( | )q ( , | , )q x x
( , | , ) ( | ) ( | , )q x x q x q x
( ; )( | )
( )
f xq x
z
( )z s( , | , )A x x ( )z
CRICOS No. 000213Ja university for the worldrealR
Note: Need perfect or exact simulation from for the proposal.
Key Question: How to choose , the auxiliary variable distribution?
The best choice
( ; )
( )
f y
z
( | , )p x y
( | , ) ( , | , )
( ; ) but ( ) needed in M-H!!
( )
p x y q x x
f xz
z
CRICOS No. 000213Ja university for the worldrealR
Choice (i)
CRICOS No. 000213Ja university for the worldrealR
Choice (ii)
CRICOS No. 000213Ja university for the worldrealR
Choice (i)
Fix , say at a good estimate of . Then
so does not depend on only y and cancels in .
Choice (ii)
Eg Partially ordered Markov mesh model for Ising data
Comment
Both choices can suffer from getting stuck because
can be very different from the ideal .
ˆ( )y ˆ( ; ( ))
( | , )ˆ( ( ))
f x yp x y
z y
ˆ( )z ( | )A
( | , ) approximation top x y ( ; )
, z( )
f x
( | , )p x y ( ; )
( )
f x
z
CRICOS No. 000213Ja university for the worldrealR
0 1=.2, =.160 by 60 array with
Single auxiliary variable method
(Moller et al, 2006)
Auxiliary variable is Choice (ii).
Approximation to Ising model.
Partially ordered Ma
Example: Ising Model
Run chain 500,000 iterations and thin 1 in 100
rkov mesh
model with same neighbourhood as Ising
DAG with N, W as parents, S, E as children
CRICOS No. 000213Ja university for the worldrealR
Source: Møller et al, 2006
Single auxiliary method tends to get stuckMurray et al (2006) offer suggestions involving multiple auxiliary variables
CRICOS No. 000213Ja university for the worldrealR
Outline
1. Some problems with intractable likelihoods.
2. Monte Carlo methods and Inference.
3. Normalizing constant/partition function.
4. Likelihood free Markov chain Monte Carlo.
5. Approximating Hierarchical model
6. Indirect Inference and likelihood free MCMC
7. Conclusions.
CRICOS No. 000213Ja university for the worldrealR
4. Likelihood free MCMC
Single Auxiliary Variable Method as almost Approximate Bayesian Computation (ABC)
We wish to eliminate or equivalently , the likelihood from the M-H algorithm.
Solution: The distribution of x given y and puts all probability on y, the observed data,
then
with the likelihood
This might work for discrete data, sample size small, and if the proposal were a very good approximation to . If sufficient statistics s(y) exist then
( ; ) / ( )f y z ( | )p y
( | , ) ( )p x y Ind x y
( ) ( | )( , | , ) ( )
( ) ( | )
p qA x x Ind x y
p q
( ; )~
( )
f xx
z
( | ) ( | , )q q y ( | )p y
( ( ) ( )) replaces ( ).Ind s x s y Ind x y
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
Likelihood free methods, ABC- MCMC
Change of notation, observed data (fixed), y is pseudo data or auxiliary data generated from the likelihood .
Instead of , now have y close to in the sense of statistics s( ),
distance
ABC allows rather than equal to 0
Target distribution for variables
Standard M-H with proposals
(Marjoram et al 2003; ABC MCMC)
for acceptance of .
Ideally should be small but this leads to very small acceptance probabilities.
obsy( | )p y
obsy y obsy obsy
( , ) || ( ) ( ) || .obs obsd y y s y s y
( , )obsd y y
( , )y ( , | , ) ( | ) ( ) ( ( , ) ).obs obsp y y p y p Ind d y y
~ ( | , )
~ ( | )obsq y
y p y
( , ) ( , )y y
CRICOS No. 000213Ja university for the worldrealR
Issues of implementing Metropolis-Hastings ABC
(a) Tune for to get reasonable acceptance probabilities;
(b) All satisfying (hard) accepted
with equal probability
rather than smoothly weighted by (soft).
(c) Choose summary statistics carefully if no sufficient statistics
( , )y ( , )obsd y y
( , )obsd y y
CRICOS No. 000213Ja university for the worldrealR
Tune for
A solution is to allow to vary as a parameter (Bortot et al, 2004). The target distribution is
Run chain and post filter output for small values of
( , , | ) ( | ) ( ) ( ( , ) ) ( ).obs obsp y y p y p Ind d y y p
CRICOS No. 000213Ja university for the worldrealR
Outline
1. Some problems with intractable likelihoods.
2. Monte Carlo methods and Inference.
3. Normalizing constant/partition function.
4. Likelihood free Markov chain Monte Carlo.
5. Approximating Hierarchical model
6. Indirect Inference and likelihood free MCMC
7. Conclusions.
CRICOS No. 000213Ja university for the worldrealR
Beaumont, Zhang and Balding (2002) use kernel smoothing in ABC-MC
CRICOS No. 000213Ja university for the worldrealR
(Reeves and Pettitt , 2005)
1Replace ( ( , ) ) by exp( ( , )), a soft
2
constraint, with replacing .
Interpret as an approximate likelihood for
Ke
Soft Constraint f
y Idea.
or ( , )ob
obs
s
obsInd d y y d y y
d y y
( | )
Simple case ( | ) ( , ) with φ known
Approximating Hierarchical model with joint probability
( ) ( | ) ( | )
obs
obs
obs
y y
y y N y
p p y p y y
CRICOS No. 000213Ja university for the worldrealR
Approximating Hierarchical Model
CRICOS No. 000213Ja university for the worldrealR
.
2
2
Sufficient statistic
( , ) ( ) and "likelihood"
( | ) ( , )
Integrate out pseudo data from ( , ,
Normal model with mean , variance ,sample .Simple case.
obs
obs obs
obs
obs
y
d y y y y
y y N y
y p y y
n
2
) to get
marginal ( , ) to obtain
( , ) ( ; , ) ( )
Approximation using pseudo data and match to observed data
introduces in likelihood ap
proximavariance inflationKey idea
obs
obs obs
p y
p y N y pn
tion. Will affect posterior
mean (if prior not improper vague) and variance.
CRICOS No. 000213Ja university for the worldrealR
1Implement a tempering scheme with 0
ˆand is the posterior mean estimated from
chain .
Combine estimate
General scheme to overcome variance inflationand bias of posterior
k
j
j
20 1 2
0
s using weighted least squares
and bias estimated using a quadratic in , say
ˆ , 1, ,
gives combined estimate of posterior mean for 0
Similarly for pos
j j j error j k
terior variance and quantile estimates
using chain
(compare Liu, 2001, MCMC for "indirect models")
j
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
-1 and , propose swaps of
( , ) values to improve mixing.
M-H ratio does not involve intractable likelihood.
Soft constraint improve
Parallel Temperi
For chai
ng t
ns
o improve mixing
j j
y
s mixing
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
0 1
2 2 2
=.2, =.1
Compare
,30 ,15
with correct sufficient statistics
and (
60 by 60 array with
Exact method, auxiliary variable method
ABC with = 50
Example: Ising Model (continued)
iy Ind y
)
Run chains 500,000 and thin 1 in 100
i ji j
y
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
Outline
1. Some problems with intractable likelihoods.
2. Monte Carlo methods and Inference.
3. Normalizing constant/partition function.
4. Likelihood free Markov chain Monte Carlo.
5. Approximating Hierarchical model
6. Indirect Inference and likelihood free MCMC
7. Conclusions.
CRICOS No. 000213Ja university for the worldrealR
(Gourieroux et al, 1993)
(also Heggland and Frigessi, 2004)
Observe data .
Suppose True model ( | ) is intractable
but Approximating model
Indirect Inference
obs
T
y
p y ( | ) is tractable,
and with the same support.
ˆCan find ( ) easily.
ˆPut into model and obtain ( | ).
Repeat many times for simulated from ( | )
to find an
A
T
p x
x y
x
y x y
y p y
ˆaccurate value ( ).
ˆ ˆ ˆFind so that ( ) is close to ( ) giving ( ).obs obsy y
CRICOS No. 000213Ja university for the worldrealR
.(Reeves and Pettitt, 2005; P & R, 2006)
Consider the True hierarchical model
( ) ( | )
and the Approximating hierarchical model
( ) ( | ) ( | )
Hierarchical Model using ideas of Indirect Inference
T obs
T A
p p y
p p y p y
( | )
with
is pseudo data from ( | )
( | ) being the True intractable likelihood
( | ) is an Approximating model distribution
( | ) is an Approximating likelihood eval
posterior
A obs
T
T
A
A obs
p y
y p y
p y
p y
p y
uated
at the observed data
: Marginilising the Approximating HM over
random and should be close to the True HM
Key point
obsy
y
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
Can be implemented with the proposal using
the idea from Moller et al (2006).
( , , | , , ) ( | ) ( | ) ( | )
and then the MH acceptance
A Metropolis Hastings algorithm for Indirect Inference
T Aq y y q p y p y
probability is given with
( | ) ( | ) ( )( , , | , , )
( | ) ( | ) ( )
which replaces the likelihood ratio for intractable
and replaces it by a ratio involving tractable .
A obs
A obs
T
A
p y q pA y y
p y q p
p
p
CRICOS No. 000213Ja university for the worldrealR
0 1=.2, =.1
MH Indirect Inference implemented with
Approximate Likelihood taken as the POMM
with 2 parameters equivalent to Ising model
60 by 60 array with
Example: Ising Model (continued)
0 1,
20,000 iterations with Approximating posterior
found from "side MH chain" with 400 iterations.
No summary statistics required, implied by
Approximating Likelihood
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
CRICOS No. 000213Ja university for the worldrealR
Some points
• How could approximate posterior be made more precise?– Use more parameters in approximating likelihood, the
POMM? (Gouriéroux at al (1993), Heggland and Frigassi (2004) discuss this in the frequentist setting)
– More iterations for side chain “exact” calculation of approximate posterior?
• How to choose a good approximating likelihood?• Relationship to summary statistics approach?
CRICOS No. 000213Ja university for the worldrealR
Outline
1. Some problems with intractable likelihoods.
2. Monte Carlo methods and Inference.
3. Normalizing constant/partition function.
4. Likelihood free Markov chain Monte Carlo.
5. Approximating Hierarchical model
6. Indirect Inference and likelihood free MCMC
7. Conclusions.
CRICOS No. 000213Ja university for the worldrealR
Conclusions1. For the normalizing constant problem we presented a single on-
line M-H algorithm.2. We linked these ideas to ABC-MCMC and developed a
hierarchical model (HM) to approximate the true posterior – showed variance inflation.
3. We showed that the approximating HM could be tempered swaps made to improve mixing using parallel chains, variance inflation effect corrected by smoothing posterior summaries
from the tempered chains.
4. We extended indirect inference to an HM to find a way of implementing the Metropolis Hastings algorithm which is likelihood free.
5. We demonstrated the ideas with the Ising/autologistic model.6. Application to specific examples is on-going and requires
refinement of general approaches.
CRICOS No. 000213Ja university for the worldrealR
Acknowledgements
Support of the Australian Research Council
Co-authors Rob Reeves, Jesper Møller, Kasper Berthelsen
Discussions with Malcolm Faddy, Gareth Ridall, Chris Glasbey, Grant Hamilton …