Queensland University of Technology CRICOS No. 000213J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane [email protected] Joint work with

Queensland University of Technology

CRICOS No. 000213J

Towards Likelihood Free Inference

Tony PettittQUT, Brisbane

[email protected]

Joint work with Rob Reeves

CRICOS No. 000213Ja university for the worldrealR

Outline

1. Some problems with intractable likelihoods.

2. Monte Carlo methods and Inference.

3. Normalizing constant/partition function.

4. Likelihood free Markov chain Monte Carlo.

5. Approximating Hierarchical model

6. Indirect Inference and likelihood free MCMC

7. Conclusions.


Stochastic models (Riley et al, 2003)

Macroparasite within a host.Juvenile worm grows to adulthood in a cat.Host fights back with immunity.Number of Juveniles, Adults and amount of Immunity (all integer).

evolve through time according to Markov process unknown parameters, eg

Juvenile → Adult rate of maturationImmunity changes with timeJuveniles die due to Immunity

Moment closure approximations for distribution of limited to restricted parameter values.

( ), ( ), ( )J t A t I t

( ( ), ( ), ( ))J t A t I t


Numerical computation of limited by small maximum values of J, A, I.

Can simulate process easily.

Data: J at t=0 and A at t (sacrifice of cat), replicated with several cats

( , , )pr J A I

Source: Riley et al, 2003.

( ( ), ( ), ( ))J t A t I t


Other stochastic process models include

spatial stochastic expansion of species

(Hamilton et al, 2005; Estoup et al, 2004)

birth-death-mutation process for estimating transmission rate from TB genotyping

(Tanaka et al, 2006)

population genetic models, eg coalescent models

(Marjoram et al 2003)

Likelihood free Bayesian MCMC methods are often employed with quite precise priors.


Normalizing constant/partition function problem.

The algebraic form of the distribution for y is known but it is not normalized, eg Ising model

For means neighbours (on a lattice, say). The normalizing constant involves in general a sum over terms.

Write

0 1~

( | ) exp ( )i i ji i j

p y y Ind y y

{ 1,1} 1, , and ~iy i n i j

( ; ) known( | )

( ) unknown

f yp y

z

2n


N-S and E-W neighbourhood


Outline







7. Conclusions.


Monte Carlo methods and Inference.

Intractable likelihood, instead use easily simulated values of y.

Simulated method of moments (McFadden, 1989).

Method of estimation: comparing theoretical moments or frequencies with observed moments or frequencies.

Can be implemented using a chi-squared goodness-fit-statistic, eg Riley et al, 2003. Data: number of adult worms in cat at sacrifice.


Source: Riley et al 2003.

Plot of goodness-of-fit statistic versus parameter. Greedy Monte Carlo. Precision of estimate?


Outline







7. Conclusions.


3. Normalizing constant/partition function and MCMC

(half-way to likelihood free inference)

Here we assume (Møller, Pettitt, Reeves and Berthelsen, 2006)

Key idea Importance sample estimate of given by

Sample

.

( ; ) known( | )

( ) unknown

z( ) ( ; ) and difficult to find.

f yp y

z

f y dy

( )

( )

z

z

~ ( | ) theny p y ( ; ) z( )

unbiased estimate of ( ; ) z( )

f y

f y


Used off-line to estimate then carry out standard Metropolis-

Hastings with interpolation over a grid of values.( eg Green

and Richardson, 2002, in a Potts model).

Standard Metropolis Hastings: Simulating from target distribution

Acceptance ratio for changing

accepted with probability .

Key Question: Can be calculated on-line or avoided?

( ) / ( )z z ( )

( )

z

z

( | ) ( | ) ( ).p y p y p

, proposal ( | )q ( | ) ( ) ( | )

( | ) ( | ) ( ) ( | )

( ; ) ( ) ( | ) ( )

( ; ) ( ) ( | ) ( )

p y p qA

p y p q

f y p q z

f y p q z

min{1, ( | )}A ( )

( ')

z

z


On-line algorithm – single auxiliary variable method.

Introduce auxiliary variable x on same space as y and extend target distribution for the MCMC

Key Question: How to choose distribution of x so that

removed from

Now acceptance ratio is as a new pair proposed.

Proposal becomes .

Assume the factorisation

Choose the proposal so that

Then algebra → cancellation of and

does not depend on

( , | ) ( | , ) ( | ) ( ).p x y p x y p y p ( )z

( | ).A ( , | , )A x x ( , )x

( | )q ( , | , )q x x

( , | , ) ( | ) ( | , )q x x q x q x

( ; )( | )

( )

f xq x

z

( )z s( , | , )A x x ( )z


Note: Need perfect or exact simulation from for the proposal.

Key Question: How to choose , the auxiliary variable distribution?

The best choice

( ; )

( )

f y

z

( | , )p x y

( | , ) ( , | , )

( ; ) but ( ) needed in M-H!!

( )

p x y q x x

f xz

z


Choice (i)


Choice (ii)


Choice (i)

Fix , say at a good estimate of . Then

so does not depend on only y and cancels in .

Choice (ii)

Eg Partially ordered Markov mesh model for Ising data

Comment

Both choices can suffer from getting stuck because

can be very different from the ideal .

ˆ( )y ˆ( ; ( ))

( | , )ˆ( ( ))

f x yp x y

z y

ˆ( )z ( | )A

( | , ) approximation top x y ( ; )

, z( )

f x

( | , )p x y ( ; )

( )

f x

z


0 1=.2, =.160 by 60 array with

Single auxiliary variable method

(Moller et al, 2006)

Auxiliary variable is Choice (ii).

Approximation to Ising model.

Partially ordered Ma

Example: Ising Model

Run chain 500,000 iterations and thin 1 in 100

rkov mesh

model with same neighbourhood as Ising

DAG with N, W as parents, S, E as children


Source: Møller et al, 2006

Single auxiliary method tends to get stuckMurray et al (2006) offer suggestions involving multiple auxiliary variables


Outline







7. Conclusions.


4. Likelihood free MCMC

Single Auxiliary Variable Method as almost Approximate Bayesian Computation (ABC)

We wish to eliminate or equivalently , the likelihood from the M-H algorithm.

Solution: The distribution of x given y and puts all probability on y, the observed data,

then

with the likelihood

This might work for discrete data, sample size small, and if the proposal were a very good approximation to . If sufficient statistics s(y) exist then

( ; ) / ( )f y z ( | )p y

( | , ) ( )p x y Ind x y

( ) ( | )( , | , ) ( )

( ) ( | )

p qA x x Ind x y

p q

( ; )~

( )

f xx

z

( | ) ( | , )q q y ( | )p y

( ( ) ( )) replaces ( ).Ind s x s y Ind x y



Likelihood free methods, ABC- MCMC

Change of notation, observed data (fixed), y is pseudo data or auxiliary data generated from the likelihood .

Instead of , now have y close to in the sense of statistics s( ),

distance

ABC allows rather than equal to 0

Target distribution for variables

Standard M-H with proposals

(Marjoram et al 2003; ABC MCMC)

for acceptance of .

Ideally should be small but this leads to very small acceptance probabilities.

obsy( | )p y

obsy y obsy obsy

( , ) || ( ) ( ) || .obs obsd y y s y s y

( , )obsd y y

( , )y ( , | , ) ( | ) ( ) ( ( , ) ).obs obsp y y p y p Ind d y y

~ ( | , )

~ ( | )obsq y

y p y

( , ) ( , )y y


Issues of implementing Metropolis-Hastings ABC

(a) Tune for to get reasonable acceptance probabilities;

(b) All satisfying (hard) accepted

with equal probability

rather than smoothly weighted by (soft).

(c) Choose summary statistics carefully if no sufficient statistics

( , )y ( , )obsd y y

( , )obsd y y


Tune for

A solution is to allow to vary as a parameter (Bortot et al, 2004). The target distribution is

Run chain and post filter output for small values of

( , , | ) ( | ) ( ) ( ( , ) ) ( ).obs obsp y y p y p Ind d y y p


Outline







7. Conclusions.


Beaumont, Zhang and Balding (2002) use kernel smoothing in ABC-MC


(Reeves and Pettitt , 2005)

1Replace ( ( , ) ) by exp( ( , )), a soft

2

constraint, with replacing .

Interpret as an approximate likelihood for

Ke

Soft Constraint f

y Idea.

or ( , )ob

obs

s

obsInd d y y d y y

d y y

( | )

Simple case ( | ) ( , ) with φ known

Approximating Hierarchical model with joint probability

( ) ( | ) ( | )

obs

obs

obs

y y

y y N y

p p y p y y


Approximating Hierarchical Model


.

2

2

Sufficient statistic

( , ) ( ) and "likelihood"

( | ) ( , )

Integrate out pseudo data from ( , ,

Normal model with mean , variance ,sample .Simple case.

obs

obs obs

obs

obs

y

d y y y y

y y N y

y p y y

n

2

) to get

marginal ( , ) to obtain

( , ) ( ; , ) ( )

Approximation using pseudo data and match to observed data

introduces in likelihood ap

proximavariance inflationKey idea

obs

obs obs

p y

p y N y pn

tion. Will affect posterior

mean (if prior not improper vague) and variance.


1Implement a tempering scheme with 0

ˆand is the posterior mean estimated from

chain .

Combine estimate

General scheme to overcome variance inflationand bias of posterior

k

j

j

20 1 2

0

s using weighted least squares

and bias estimated using a quadratic in , say

ˆ , 1, ,

gives combined estimate of posterior mean for 0

Similarly for pos

j j j error j k

terior variance and quantile estimates

using chain

(compare Liu, 2001, MCMC for "indirect models")

j



-1 and , propose swaps of

( , ) values to improve mixing.

M-H ratio does not involve intractable likelihood.

Soft constraint improve

Parallel Temperi

For chai

ng t

ns

o improve mixing

j j

y

s mixing



0 1

2 2 2

=.2, =.1

Compare

,30 ,15

with correct sufficient statistics

and (

60 by 60 array with

Exact method, auxiliary variable method

ABC with = 50

Example: Ising Model (continued)

iy Ind y

)

Run chains 500,000 and thin 1 in 100

i ji j

y




Outline







7. Conclusions.


(Gourieroux et al, 1993)

(also Heggland and Frigessi, 2004)

Observe data .

Suppose True model ( | ) is intractable

but Approximating model

Indirect Inference

obs

T

y

p y ( | ) is tractable,

and with the same support.

ˆCan find ( ) easily.

ˆPut into model and obtain ( | ).

Repeat many times for simulated from ( | )

to find an

A

T

p x

x y

x

y x y

y p y

ˆaccurate value ( ).

ˆ ˆ ˆFind so that ( ) is close to ( ) giving ( ).obs obsy y


.(Reeves and Pettitt, 2005; P & R, 2006)

Consider the True hierarchical model

( ) ( | )

and the Approximating hierarchical model

( ) ( | ) ( | )

Hierarchical Model using ideas of Indirect Inference

T obs

T A

p p y

p p y p y

( | )

with

is pseudo data from ( | )

( | ) being the True intractable likelihood

( | ) is an Approximating model distribution

( | ) is an Approximating likelihood eval

posterior

A obs

T

T

A

A obs

p y

y p y

p y

p y

p y

uated

at the observed data

: Marginilising the Approximating HM over

random and should be close to the True HM

Key point

obsy

y



Can be implemented with the proposal using

the idea from Moller et al (2006).

( , , | , , ) ( | ) ( | ) ( | )

and then the MH acceptance

A Metropolis Hastings algorithm for Indirect Inference

T Aq y y q p y p y

probability is given with

( | ) ( | ) ( )( , , | , , )

( | ) ( | ) ( )

which replaces the likelihood ratio for intractable

and replaces it by a ratio involving tractable .

A obs

A obs

T

A

p y q pA y y

p y q p

p

p


0 1=.2, =.1

MH Indirect Inference implemented with

Approximate Likelihood taken as the POMM

with 2 parameters equivalent to Ising model

60 by 60 array with

Example: Ising Model (continued)

0 1,

20,000 iterations with Approximating posterior

found from "side MH chain" with 400 iterations.

No summary statistics required, implied by

Approximating Likelihood




Some points

• How could approximate posterior be made more precise?– Use more parameters in approximating likelihood, the

POMM? (Gouriéroux at al (1993), Heggland and Frigassi (2004) discuss this in the frequentist setting)

– More iterations for side chain “exact” calculation of approximate posterior?

• How to choose a good approximating likelihood?• Relationship to summary statistics approach?


Outline







7. Conclusions.


Conclusions1. For the normalizing constant problem we presented a single on-

line M-H algorithm.2. We linked these ideas to ABC-MCMC and developed a

hierarchical model (HM) to approximate the true posterior – showed variance inflation.

3. We showed that the approximating HM could be tempered swaps made to improve mixing using parallel chains, variance inflation effect corrected by smoothing posterior summaries

from the tempered chains.

4. We extended indirect inference to an HM to find a way of implementing the Metropolis Hastings algorithm which is likelihood free.

5. We demonstrated the ideas with the Ising/autologistic model.6. Application to specific examples is on-going and requires

refinement of general approaches.


Acknowledgements

Support of the Australian Research Council

Co-authors Rob Reeves, Jesper Møller, Kasper Berthelsen

Discussions with Malcolm Faddy, Gareth Ridall, Chris Glasbey, Grant Hamilton …

Documents

Queensland University of Technology CRICOS No. 000213J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane [email protected] Joint work with