Discussion of Persi Diaconis' lecture at ISBA 2016

Shifting our views on Monte Carlo

Christian P. Robert

Universite Paris-Dauphine, Paris & University of Warwick, Coventry

June 6, 2016adapted from NIPS 2015 slides

Larry’s constant

1 Larry’s constant

2 Charlie’s logistic regression

3 Xiao-Li’s MLE

4 Persi’s Bayesian numerics

Example 11.10

“Suppose that f is a probability density function and that

f(x) = cg(x)

, where g is a known function and c > 0 is unknown (...)Despite the fact that c is unknown, it is often possible todraw X1, . . . , Xn from f. Can we use the sample toestimate the normalizing constant c?

Here is a frequentist solution: Let f be a consistentestimate of the density f. Choose any point x and notethat c = f(x)/g(x). Hence c = f(x)/g(x) is a consistentestimate of c.”

Example 11.10: “the” argument

“Now let us try to solve the problem from a Bayesianapproach. Let π(c) be a prior such that π(c) > 0 for allc > 0. The likelihood function is

Ln(c) =∏n

i=1f(Xi) =

i=1cg(Xi) = c

i=1g(Xi) ∝ cn .

Hence the posterior is proportional to cnπ(c). Theposterior does not depend on X1, . . . , Xn, so we come tothe startling conclusion that from the Bayesian point ofview, there is no information in the data about c.”

c© “Bayesians are slaves to the likelihood function”

• there is no likelihood function such as Ln(c) [since moving cdoes not modify the sample]

• “there is no information in the data about c” but there isabout f

• not a statistical problem [at face value], rather a numericalproblem with many Monte Carlo solutions

• Monte Carlo methods are frequentist (LLN) and asymptotical(as in large numbers)

• but use of probabilistic numerics brings uncertainty evaluation

a broader picture

• Larry’s problem somehow relates to the infamous harmonicmean estimator issue

• highlight paradoxical differences between statistics and MonteCarlo methods:• statistics constrained by sample and its distribution• Monte Carlo free to generate samples• no best unbiased estimator or optimal solution in Monte Carlo

integration

• paradox of the fascinating “Bernoulli factory” problem, whichrequires infinite sequence of Bernoullis

[Flegal & Herbei, 2012; Jacob & Thiery, 2015]

• highly limited range of parameters allowing for unbiasedestimation versus universal debiasing of converging sequences

[McLeish, 2011; Rhee & Glynn, 2012, 2013]

reverse logistic regression

3 Xiao-Li’s MLE

regression estimator

GivenXij ∼ fi(x) = cihi(x)

with hi known and ci unknown (i = 1, . . . , k, j = 1, . . . , ni),constants ci estimated by a “reverse logistic regression” based onthe quasi-likelihood

L(η) =∑k

j=1log pi(xij, η)

pi(x, η) = exp{ηi}hi(x)/∑k

i=1exp{ηi}hi(x)

[Anderson, 1972; Geyer, 1992]Approximation

log ci = logni/n− ηi

regression estimator

GivenXij ∼ fi(x) = cihi(x)

with hi known and ci unknown (i = 1, . . . , k, j = 1, . . . , ni),constants ci estimated by a “reverse logistic regression” based onthe quasi-likelihood

L(η) =∑k

j=1log pi(xij, η)

pi(x, η) = exp{ηi}hi(x)/∑k

i=1exp{ηi}hi(x)

[Anderson, 1972; Geyer, 1992]Approximation

log ci = logni/n− ηi

statistical framework?

Existence of a central limit theorem:

√n (ηn − η)

L−→ Nk(0, B+AB)

[Geyer, 1992; Doss & Tan, 2015]

• strong convergence properties

• asymptotic approximation of the precision

• connection with bridge sampling and auxiliary model [mixture]

• ...but nothing statistical there [no estimation]

• which optimality? [weights unidentifiable]

[Kong et al., 2003; Chopin & Robert, 2011]

statistical framework?

Existence of a central limit theorem:

√n (ηn − η)

L−→ Nk(0, B+AB)

[Geyer, 1992; Doss & Tan, 2015]

• strong convergence properties

• asymptotic approximation of the precision

• connection with bridge sampling and auxiliary model [mixture]

• ...but nothing statistical there [no estimation]

• which optimality? [weights unidentifiable]

[Kong et al., 2003; Chopin & Robert, 2011]

partition function and maximum likelihood

For parametric family

f(x; θ) = p(x; θ)/Z(θ)

• normalising constant Z(θ) also called partition function

• ...if normalisation possible

• essential part of inference

• estimation by score matching [matching scores of model anddata]

• ...and by noise-contrastive estimation [generalised Charlie’sregression]

[Gutmann & Hyvarinen, 2012, 2015]

partition function and maximum likelihood

For parametric family

f(x; θ) = p(x; θ)/Z(θ)

Generic representation with auxiliary data y from knowndistribution fy and regression function

h(u; θ) =

nyexp(−G(u; θ))

Objective function

J(θ) =∑nx

i=1logh(xi; θ) +

i=1log{1− h(yi; θ)}

that can be maximised with no normalising constant[Gutmann & Hyvarinen, 2012, 2015]

Xiao-Li’s MLE

3 Xiao-Li’s MLE

[Meng, 2011, IRCEM]

Xiao-Li’s MLE

“The task of estimating an integral by Monte Carlomethods is formulated as a statistical model usingsimulated observations as data.The difficulty in this exercise is that we ordinarily haveat our disposal all of the information required (..) butwe choose to ignore some (...) for simplicity orcomputational feasibility.”

[Kong, McCullagh, Meng, Nicolae & Tan, 2003]

Xiao-Li’s MLE

“Our proposal is to use a semiparametric statisticalmodel that makes explicit what information is ignored(...) The parameter space in this model is a set ofmeasures on the sample space (...) None-the-less,from simulated data the base-line measure can beestimated by maximum likelihood, and the requiredintegrals computed by a simple formula previouslyderived by Geyer.”

Xiao-Li’s MLE

“By contrast with Geyer’s retrospective likelihood, acorrect estimate of simulation error is availabledirectly from the Fisher information. The principaladvantage of the semiparametric model is thatvariance reduction techniques are associated withsubmodels in which the maximum likelihood estimatorin the submodel may have substantially smallervariance.”

(c.) Rachel2002

Xiao-Li’s MLE

“At first glance, the problem appears to be an exercise in calculusor numerical analysis, and not amenable to statistical formulation”

• use of Fisher information

• non-parametric MLE based onsimulations

• comparison of samplingschemes through variances

• Rao–Blackwellisedimprovements by invarianceconstraints [Meng, 2011, IRCEM]

reverse logistic regression

3 Xiao-Li’s MLE

4 Persi’s Bayesian numerics [P. Diaconis, 1988]

Persi’s probabilistic numerics

”We deliver a call to arms for probabilistic numericalmethods: algorithms for numerical tasks (...) that returnuncertainties in their calculations” P. Hennig, M.Osborne & M. Girolami, 2015

• use of probabilistic or stochastic arguments in numericalresolution of (possibly) deterministic problems

• Bayesian formulation via prior information or belief about thequantity of interest

• call to an usually Gaussian process prior

• derivation of both an estimator that is identical to a numericalmethod (e.g., Runge-Kutta) and uncertainty or variabilityaround this estimator

Persi’s probabilistic numerics

• use of probabilistic or stochastic arguments in numericalresolution of (possibly) deterministic problems

• Bayesian formulation via prior information or belief about thequantity of interest

• call to an usually Gaussian process prior

• derivation of both an estimator that is identical to a numericalmethod (e.g., Runge-Kutta) and uncertainty or variabilityaround this estimator

”probabilistic numerical methods provide a coherentframework for identifying the uncertainty in calculationsperformed with a combination of numerical algorithms(...) potentially allowing the diagnosis (...) of errorsources in computations”

My uncertainties

• answer to the zero variance estimator

• significance of a probability statement about a mathematicalconstant [other than epistemic]

• posterior in functional spaces mostly reflect choice of priorrather than information

• big world versus small worlds debate[Robbins & Wasserman, 2000]

• possible incoherence of Bayesian inference in functional spaces

• unavoidable recourse to (and impact of) Bayesian priormodelling

a shift on Monte Carlo approximations

When considering integral approximation

h(x)f(x) dx

consider first approximating the distribution of h(X) beforederiving an expectation

• non-parametric component

• fight the curse of dimensionality

• potential connections with nested sampling

• coherent incorporation of multiple estimators

a shift on Monte Carlo approximations

When considering integral approximation

h(x)f(x) dx

consider first approximating the distribution of h(X) beforederiving an expectation

• non-parametric component

• fight the curse of dimensionality

• potential connections with nested sampling

• coherent incorporation of multiple estimators

Discussion of Persi Diaconis' lecture at ISBA 2016

Science

Tugas ISBA Klpk

T.H.Koornwinder@uva.nl dick.koornwinder@gmail.com … · Diaconis left home at 14 to travel with sleight-of-hand legend Dai Vernon. Persi Diaconis Dai Vernon, “The Professor

Comparison Techniques for Random Walk on Finite Groups ...scs/Courses/Stat376/Papers/...Comparison Techniques for Random Walk on Finite Groups Persi Diaconis; Laurent Saloff-Coste

For Persi Diaconis’ Next Magic Trick · For Persi Diaconis’ Next Magic Trick ... from home at age 14 to go on the road with the sleight-of-hand virtuoso Dai Vernon. But unlike

Bitcoin ISBA v3

(145837952) Peran Persi & Persi Daerah 29112013

ON FIXED POINTS OF PERMUTATIONS - Stanford Universitystatweb.stanford.edu/~cgates/PERSI/papers/dfgaug15x.pdf · ON FIXED POINTS OF PERMUTATIONS PERSI DIACONIS, JASON FULMAN, AND ROBERT

SUPERCHARACTERS AND SUPERCLASSES FOR ...statweb.stanford.edu/.../PERSI/papers/supercharacters.pdfSUPERCHARACTERS AND SUPERCLASSES FOR ALGEBRA GROUPS by Persi Diaconis Mathematics Department

STATED MEETING REPORT - · PDF fileSTATED MEETING REPORT The Problem of Thinking Too Much Persi Diaconis, Mary V. Sunseri Professor of Mathematics and ... demonstrates the unity of

Random Matrices, Magic Squares and Matching Polynomialsstatweb.stanford.edu/~cgates/PERSI/papers/v11i2r2.pdf · Random Matrices, Magic Squares and Matching Polynomials Persi Diaconis

The Markov Chain Monte Carlo Revolution - Stanford Universitystatweb.stanford.edu/~cgates/PERSI/papers/MCMCRev.pdf · The Markov Chain Monte Carlo Revolution Persi Diaconis Abstract

ISBA Asamblea

SHUFFLING CARDS AND STOPPING TIMEScgates/PERSI/papers/aldous86.pdf · Persi Diaconis left High School at an early age to earn a living as a magician and gambler, only later to become

Stanford Universitycgates/PERSI/papers/diaconis-2.pdf · %.@v ' @ \%xl"2 -4? %.' ac'&d @bacnb-3#&% %.,4- n bnbac5. ' nb-! ac' # ,4# 7'&n _ %., -4n ! bnv'p!qnv!4 &@b? u w#&#&nv'&d

Geometric Analysis for the Metropolis Algorithm on Lipschitz … · 2010-09-17 · Geometric Analysis for the Metropolis Algorithm on Lipschitz Domains Persi Diaconis,∗ Gilles Lebeau,†‡

BOBBIE CHERN, PERSI DIACONIS, DANIEL M. KANE, AND … · 2 BOBBIE CHERN, PERSI DIACONIS, DANIEL M. KANE, AND ROBERT C. RHOADES This paper gives a large family of statistics that admit

Reproducing kernel orthogonal polynomials on the ...cgates/PERSI/papers/RKPolysR4.pdf · Reproducing kernel orthogonal polynomials on the multinomial distribution. Persi Diaconis

Introduction interval graph g ij - Stanford Universitystatweb.stanford.edu/~cgates/PERSI/papers/BADinterval2.pdf · INTERVAL GRAPH LIMITS PERSI DIACONIS, SUSAN HOLMES, AND SVANTE

Introduction - Home | Department of Statisticsstatweb.stanford.edu/~susan/papers/casino.pdf · 2011-10-06 · ANALYSIS OF CASINO SHELF SHUFFLING MACHINES PERSI DIACONIS, ... using

Medios digitales-isba