"reflections on the probability space induced by moment conditions with implications for...

“reflections on the probability space induced bymoment conditions with implications for Bayesian

inference”: a discussion

Christian P. RobertUniversite Paris-Dauphine, Paris & University of Warwick, Coventry

bayesianstatistics@gmail.com

Outline

what is the question?

what could the question be?

what is the answer?

what could the answer be ?

what is the question?

”If one specifies a set of moment functions collectedtogether into a vector m(x , θ) of dimension M, regards θas random and asserts that some transformation Z (x , θ)has distribution ψ, then what is required to use thisinformation and then possibly a prior to make validinference?” R. Gallant, p.4

Priors without efforts

I quest for model induced prior dating back to early 1900’s[Lhoste, 1923]

I reference priors such as Jeffreys’ prior induced by samplingdistribution

[Jeffreys, 1939]

I Fiducial distributions as Fisher’s attempted answer[Fisher, 1956]

Fisher’s t fiducial distribution

When considering

t =x − θ

the ratio has a frequentist t distribution with n − 1 degrees offreedom

However, no equivalent justification in asserting that

t =x − θ

has a t posterior distribution with n − 1 degrees of freedom on θ,given (x , s) except when using a non-informative and improperprior π(θ,σ2) ∝ 1/σ2 since, then

θ ∼ Tn−1(x , s/√n)

Furthermore, neither Bayesian nor frequentist interpretation impliesthat

t =x − θ

has a t posterior distribution with n − 1 degrees of freedom jointly

what could the question be?

Given a set of moment equations

E[m(X1, . . . , Xn, θ)] = 0

(where both the Xi ’s and θ are random), can one derive alikelihood function and a prior distribution compatible with thoseconstraints?

coherence across sample sizes n

Highly complex question since it implies the integral equation∫Θ×Xn

m(x1, . . . , xn, θ)π(θ)f (x1|θ) · · · f (xn|θ)dθdx1 · · · dxn = 0

must or should have a solution in (π, f ) for all n’s.possible outside of a likelihood x prior modelling?

coherence across sample sizes n

Highly complex question since it implies the integral equation∫Θ×Xn

m(x1, . . . , xn, θ)π(θ)f (x1|θ) · · · f (xn|θ)dθdx1 · · · dxn = 0

must or should have a solution in (π, f ) for all n’s.possible outside of a likelihood x prior modelling?

Zellner’s Bayesian method of moments

Given moment conditions on parameter θ and σ2

E[θ|x1, . . . , xn] = xn E[σ2|x1, . . .] = s2n var(θ|σ2, x1, . . .) = σ2/n

derivation of a maximum entropy posterior

θ|σ2, x1, . . . ∼ N(xn, σ2/n) σ−2|x1, . . . ∼ Exp(s2n)

[Zellner, 1996]

but incompatible with corresponding predictive distribution[Geisser & Seidenfeld, 1999]

Zellner’s Bayesian method of moments

Given moment conditions on parameter θ and σ2

E[θ|x1, . . . , xn] = xn E[σ2|x1, . . .] = s2n var(θ|σ2, x1, . . .) = σ2/n

derivation of a maximum entropy posterior

θ|σ2, x1, . . . ∼ N(xn, σ2/n) σ−2|x1, . . . ∼ Exp(s2n)

[Zellner, 1996]

but incompatible with corresponding predictive distribution[Geisser & Seidenfeld, 1999]

what is the answer?

Under the condition that Z (·, θ) is surjective,

p?(x |θ) = ψ(Z (x , θ))

and arbitrary choice of prior π(θ)

I lhs and rhs operate on different spaces

I no reason why density ψ should integrate against Lebesguemeasure in n-dimensional Euclidean space

I no direct connection with a genuine likelihood function, i.e.,product of the densities of the Xi ’s (conditional on θ)

what is the answer?

Under the condition that Z (·, θ) is surjective,

p?(x |θ) = ψ(Z (x , θ))

and arbitrary choice of prior π(θ)

I lhs and rhs operate on different spaces

I no reason why density ψ should integrate against Lebesguemeasure in n-dimensional Euclidean space

I no direct connection with a genuine likelihood function, i.e.,product of the densities of the Xi ’s (conditional on θ)

what could the answer be?

“A common situation that requires consideration of thenotions that follow is that deriving the likelihood from astructural model is analytically intractable and onecannot verify that the numerical approximations onewould have to make to circumvent the intractability aresufficiently accurate.” R. Gallant, p.7

Approximative Bayesian answers

Defining joint distribution on (θ, x1, . . . , xn) through momentequations prevents regular Bayesian inference as likelihood isunavailablethere may be alternative available:

I Approximative Bayesian computation (ABC) and empiricallikelihood based Bayesian inference

[Tavare et al., 1999; Owen, 201; Mengersen et al., 2013]

I INLA (Laplace), EP (expectation/propagation),[Martino et al., 2008; Barthelme & Chopin, 2014]

I variational Bayes[Jaakkola & Jordan, 2000]

Approximative Bayesian answers

Defining joint distribution on (θ, x1, . . . , xn) through momentequations prevents regular Bayesian inference as likelihood isunavailablethere may be alternative available:

I Approximative Bayesian computation (ABC) and empiricallikelihood based Bayesian inference

[Tavare et al., 1999; Owen, 201; Mengersen et al., 2013]

I INLA (Laplace), EP (expectation/propagation),[Martino et al., 2008; Barthelme & Chopin, 2014]

I variational Bayes[Jaakkola & Jordan, 2000]

Bayesian approximative answers

I Using a fake likelihood does not prohibit Bayesian analysis, asshown in the paper with model in eqn. (45)

I However this requires case-by-case consistency analysis sincepseudo-likelihoods do not offer same garantees

I Example of ABC model choice based on insufficient statistics[Marin et al., 2014]

Empirical likelihood (EL)

Dataset x made of n independent replicates x = (x1, . . . , xn) of arv X ∼ F

Generalized moment condition pseudo-model

[h(X ,φ)

where h known function, and φ unknown parameter

Induced empirical likelihood

Lel(φ|x) = maxp

n∏i=1

for all p such that 0 6 pi 6 1,∑

i pi = 1,∑

i pih(xi ,φ) = 0

[Owen, 1988, B’ka, & Empirical Likelihood, 2001]

Empirical likelihood (EL)

Dataset x made of n independent replicates x = (x1, . . . , xn) of arv X ∼ F

Generalized moment condition pseudo-model

[h(X ,φ)

where h known function, and φ unknown parameter

Induced empirical likelihood

Lel(φ|x) = maxp

n∏i=1

for all p such that 0 6 pi 6 1,∑

i pi = 1,∑

i pih(xi ,φ) = 0

[Owen, 1988, B’ka, & Empirical Likelihood, 2001]

Raw ABCel sampler

Naıve implementation: Act as if EL was an exact likelihood[Lazar, 2003, B’ka]

for i = 1 → N do

generate φi from the prior distribution π(·)set the weight ωi = Lel(φi |xobs)

end for

return (φi ,ωi ), i = 1, . . . , N

I Output weighted sample of size N

[Mengersen et al., 2013, PNAS]

Raw ABCel sampler

Naıve implementation: Act as if EL was an exact likelihood[Lazar, 2003, B’ka]

for i = 1 → N do

generate φi from the prior distribution π(·)set the weight ωi = Lel(φi |xobs)

end for

return (φi ,ωi ), i = 1, . . . , N

I Performance evaluated through effective sample size

ESS = 1/ N∑

/ N∑j=1

[Mengersen et al., 2013, PNAS]

"reflections on the probability space induced by moment conditions with implications for...

Economy & Finance

Robust Bayesian clustering - UCL Computer Science - · PDF fileRobust Bayesian clustering ... Bayesian learning, graphical models, approximate inference, variational inference,

Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University

Bayesian inference on mixtures

VARIATIONAL BAYESIAN PHYLOGENETIC INFERENCE

Full Bayesian inference (Learning)...Learning paradigms Learning as inference Bayesian learning, full Bayesian inference, Bayesian model averaging Model identification, maximum likelihood

Qualitative Robustness in Bayesian Inference - arXiv · Qualitative Robustness in Bayesian Inference ... Abstract The practical implementation of Bayesian inference requires numerical

Bayesian and frequentist inference for ecological ... · Key Words and Phrases: ecological inference, Bayesian inference, frequentist inference, voting patterns. 1 Introduction to

Bayesian Inference, Basics - Stony Brookzhu/ams570/Bayesian_Basics.pdf · Bayesian Inference In Bayesian inference there is a fundamental distinction between • Observable quantities

Inference in Bayesian Nets

Likelihood and Bayesian Inference · Likelihood and Bayesian Inference Joe Felsenstein Department of Genome Sciences and Department of Biology Likelihood and Bayesian Inference –

Bayesian Inference!!!

Bayesian Inference (II)

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013

Quantum Bayesian Inference - informatyka.agh.edu.pl · QuantumBayesianInference Quantum Bayesian Inference MichałGrabowski MarcinPrzewięźlikowski InstituteofComputerScience,AGH,al

Bayesian Inference - Michael Clark

Bayesian Inference (I)

DCM Bayesian Inference

Approximate Bayesian Inference I:

BAYESIAN INFERENCE Sampling techniques

Bayesian Inference for Categorical Data Analysis: A Surveypeople.stat.sc.edu/Hitchcock/bayesfinal.pdf · Bayesian Inference for Categorical Data Analysis: ... Bayesian Inference for