Download pdf - Truncation error of a superposed gamma process in a ...approximateinference.org/accepted/ArbelPrunster2016_slides.pdfFrontiers of Statistical Decision Making and Bayesian Analysis:

Truncation error of a superposed gamma processin a decreasing order representation

B [email protected] Í www.julyanarbel.com

Inria, Mistis, Grenoble, France

Joint work with Igor Prunster (Bocconi University, Milan, Italy)

Advances in Approximate Bayesian Inference NIPS Workshop, December 9, 2016

1/15

mailto:[email protected]

http://www.julyanarbel.com

2/15

Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation

Table of Contents

Completely random measures

Ferguson and Klass moment-based algorithm

Simulation study

Theoretical results

Application to density estimation

(xkcd)

3/15


Bayesian nonparametric priors

Two main categories of priors depending on parameter spaces

Spaces of functionsrandom functions

• Stochastic processess.a. Gaussian processes

• Random basis expansions

• Random densities

• Mixtures

Spaces of probability measuresdiscrete random measures

• Dirichlet process ⊂ Pitman–Yor ⊂Gibbs-type ⊂ Species sampling processes

• Completely random measures

[Wikipedia] [Brix, 1999]

4/15


Completely random measures

µ =∑i≥1

JiδZi

where the jumps (Ji )i≥1 and the jump points (Zi )i≥1 are independent

Definition (Kingman, 1967)

Random measure µ s.t. ∀ A1, . . . ,Ad disjoint setsµ(A1), . . . , µ(Ad) are mutually independent

• Independent Increment Processes, Levy processes

• Popular models with applications in biology, sparserandom graphs, survival analysis, machine learning,etc. Pivotal role in BNP (Lijoi and Prunster, 2010,Jordan, 2010)

5/15


Ferguson and Klass algorithm and goal• Jumps in decreasing order in µ =

∑∞j=1 JjδZj

• −→ Minimal error at threshold M µ(X)− µM(X) =∑∞

j=M+1 Jj• BNPdensity R package on CRAN, for F & K mixtures of normalized CRMs

Algorithm 1 Ferguson and Klass algorithm

1: sample ξj ∼ PP for j = 1, . . . ,M2: define Jj = N−1(ξj) for j = 1, . . . ,M3: sample Zj ∼ P0 for j = 1, . . . ,M4: approximate µ by µM =

∑Mj=1 JjδZj

0 J5 J4 J3 J2 J1

0

ξ1

ξ2

ξ3

ξ4

ξ5

0 0.5

0

5

6/15


Moment matching

Assessing the error of truncation at threshold M

TM = µ(X)− µM(X) =∞∑

j=M+1

Jj

Relative error index Moment-based index

eM = EFK

[JM∑Mj=1 Jj

]`M =

(1

K

K∑n=1

(m1/n

n − m1/nn

)2)1/2

Examples of completely random measures

• Generalized gamma process by Brix (1999), γ ∈ [0, 1), θ ≥ 0

• Superposed gamma process by Regazzini et al. (2003), η ∈ N• Stable-beta process by Teh and Gorur (2009), σ ∈ [0, 1), c > −σ

7/15


Moment matching

Relative error index eM Moment-based index `M

M

e M

0 5 10 15 20 25

0

.1

.2

.3 ●

●

●

●

●

●

●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

GG : a = 1

γ = 0γ = 0.25γ = 0.5γ = 0.75

M

l M

0 5 10 15 20 25

0

.2

.4

.6

●

●

●

●

●

●

●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

GG : a = 1

γ = 0γ = 0.25γ = 0.5γ = 0.75

8/15


Evaluation of error on functionals

Functional of interest: the total mass, criterion ∆1 = |µ(X)− µM(X)|Define similarly ∆k for higher order moments of the total mass

●

eM

l M

●

●

●

0.002 0.005 0.010 0.020 0.050 0.100 0.200

0.04

0.06

0.08

0.10

0.12∆1 = 0.1

∆1 = 0.05

∆4 = 0.1

∆4 = 0.05

γ = 0γ = 0.25γ = 0.5γ = 0.75

γ = 0γ = 0.25γ = 0.5γ = 0.75

9/15


Moment matching

Reverse moment index M(`) = M ↔ `M = `Number of jumps M needed to achieve a given precision, here of ` = 10%

0

1

2

0.0

0.5

0

50

100

150

M(l )

aγ

10/15


Posterior moment match

Theorem (James et al., 2009, Teh and Gorur, 2009)

In mixture models with normalized generalized gamma (left) and the Indianbuffet process based on the stable beta process (right) the posteriordistribution of µ is essentially (conditional on some latent variables)

µ∗ +k∑

j=1

J∗j δY∗j

M

l M

0 5 10 15 20 25

0

.2

.4

.6

●

●

●

●

●

●●

●●

●●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

IG : γ = 0.5, a = 1

Priork = 1k = 3k = 10

M

l M

0 5 10 15 20 25

0

.2

.4

.6

●

●

●

●

●

●

●

●

●●

●●

●●

●● ● ● ● ● ● ● ● ● ●

●

SBP: σ = 0.5, a = 1, c = 1

Priorn = 5n = 10n = 20

11/15


Posterior moment match

Posterior of inverse-Gaussian process:µ∗ +

∑kj=1 J

∗j δY∗

j

E

(k∑

j=1

J∗j

)/E(µ∗(X)

)

k\n 10 30 100

1 3.34 7.30 13.50

nγ 2.65 4.68 6.05

n 0.89 0.98 0.99 50

100

50

100

0

5

10

nk

Theorem (Arbel, De Blasi, Prunster, 2015)

Denote by P0 the true data distribution. In the NRMI model with prior guessP∗, the posterior of P converges weakly to P∞:

• if P0 is discrete, then P∞ = P0

• if P0 is diffuse, then P∞ = σP∗ + (1− σ)P0

12/15


Bounding TM in probability

Proposition (Arbel and Prunster, 2016, Brix, 1999)

Let TM be the truncation error for the Generalized Gamma or the Stable BetaProcess.

Then for any ε ∈ (0, 1),

P(TM ≤ tεM

)≥ 1− ε

for

tεM .

e−CM if σ = 0,

1

M1/σ−1 if σ 6= 0,

with ugly explicit constantsdepending on ε, γ, θ, σ and c

M

t~ Mε

0 5 10 15 20 25

0

2

4

6

8●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●●

●●

●●

●● ●

●

SBP: a = 1, c = 1

σ = 0σ = 0.5

13/15


Density estimation

Mixtures of normalized random measures with independent increments

Yi |µi , σiind∼ k(·|µi , σi ), i = 1, . . . , n,

(µi , σi )|Piid∼ P, i = 1, . . . , n,

P ∼ NRMI,

Galaxy dataset. Kolmogorov–Smirnov distance dKS(F`M , FeM ) between

estimated cdfs F`M and FeM under, respectively, the moment-match (with`M = 0.01) and the relative error (with eM = 0.1, 0.05, 0.01) criteria.

γ eM = 0.1 eM = 0.05 eM = 0.01

0 19.4 15.5 9.2

0.25 31.3 23.7 15.1

0.5 42.4 28.9 18.3

0.75 64.8 41.0 23.2

14/15

References

Discussion

• Methodology based on moments for assessing quality of approximation inFerguson and Klass algorithm, a conditional algorithm

• Should be preferred to relative error

• All-purpose criterion: validates the samples of a CRM rather than atransformation of it

• Going to be included in a new release of BNPdensity R package

• Future work: compare L1 type bounds (Ishwaran and James, 2001) in theFerguson & Klass context and in size biased settings (see the review byCampbell et al., 2016)

For more details and for extensive numerical illustrations:

A. and Prunster (2016). A moment-matching Ferguson and Klassalgorithm. Statistics and Computing. arXiv:1606.02566

arXiv:1606.02566

15/15

References

References I

Julyan Arbel and Igor Prunster (2016). A moment-matching Ferguson and Klass algorithm.Statistics and Computing.

Anders Brix (1999). Generalized gamma measures and shot-noise Cox processes. Advances inApplied Probability, pages 929–953.

Trevor Campbell, Jonathan H Huggins, Jonathan How, and Tamara Broderick (2016). Truncatedrandom measures. arXiv preprint arXiv:1603.00861.

Thomas S Ferguson and Michael J Klass (1972). A representation of independent incrementprocesses without Gaussian components. The Annals of Mathematical Statistics,43(5):1634–1643.

H. Ishwaran and L.F. James (2001). Gibbs sampling methods for stick-breaking priors. Journal ofthe American Statistical Association, 96(453):161–173.

Michael I Jordan (2010). Hierarchical models, nested models and completely random measures.Frontiers of Statistical Decision Making and Bayesian Analysis: in Honor of James O.Berger. New York: Springer.

John Kingman (1967). Completely random measures. Pacific Journal of Mathematics,21(1):59–78.

Antonio Lijoi and Igor Prunster (2010). Models beyond the Dirichlet process. Bayesiannonparametrics, 28:80.

Eugenio Regazzini, Antonio Lijoi, and Igor Prunster (2003). Distributional results for means ofnormalized random measures with independent increments. Ann. Statist., 31(2):560–585.

Yee W. Teh and Dilan Gorur (2009). Indian buffet processes with power-law behavior. InY. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, and A. Culotta, editors, Advances inNeural Information Processing Systems 22, pages 1838–1846. Curran Associates, Inc.