Truncation error of a superposed gamma processin a decreasing order representation
B [email protected] Í www.julyanarbel.com
Inria, Mistis, Grenoble, France
Joint work with Igor Prunster (Bocconi University, Milan, Italy)
Advances in Approximate Bayesian Inference NIPS Workshop, December 9, 2016
1/15
2/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Table of Contents
Completely random measures
Ferguson and Klass moment-based algorithm
Simulation study
Theoretical results
Application to density estimation
(xkcd)
3/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Bayesian nonparametric priors
Two main categories of priors depending on parameter spaces
Spaces of functionsrandom functions
• Stochastic processess.a. Gaussian processes
• Random basis expansions
• Random densities
• Mixtures
Spaces of probability measuresdiscrete random measures
• Dirichlet process ⊂ Pitman–Yor ⊂Gibbs-type ⊂ Species sampling processes
• Completely random measures
[Wikipedia] [Brix, 1999]
4/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Completely random measures
µ =∑i≥1
JiδZi
where the jumps (Ji )i≥1 and the jump points (Zi )i≥1 are independent
Definition (Kingman, 1967)
Random measure µ s.t. ∀ A1, . . . ,Ad disjoint setsµ(A1), . . . , µ(Ad) are mutually independent
• Independent Increment Processes, Levy processes
• Popular models with applications in biology, sparserandom graphs, survival analysis, machine learning,etc. Pivotal role in BNP (Lijoi and Prunster, 2010,Jordan, 2010)
5/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Ferguson and Klass algorithm and goal• Jumps in decreasing order in µ =
∑∞j=1 JjδZj
• −→ Minimal error at threshold M µ(X)− µM(X) =∑∞
j=M+1 Jj• BNPdensity R package on CRAN, for F & K mixtures of normalized CRMs
Algorithm 1 Ferguson and Klass algorithm
1: sample ξj ∼ PP for j = 1, . . . ,M2: define Jj = N−1(ξj) for j = 1, . . . ,M3: sample Zj ∼ P0 for j = 1, . . . ,M4: approximate µ by µM =
∑Mj=1 JjδZj
0 J5 J4 J3 J2 J1
0
ξ1
ξ2
ξ3
ξ4
ξ5
0 0.5
0
5
6/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Moment matching
Assessing the error of truncation at threshold M
TM = µ(X)− µM(X) =∞∑
j=M+1
Jj
Relative error index Moment-based index
eM = EFK
[JM∑Mj=1 Jj
]`M =
(1
K
K∑n=1
(m1/n
n − m1/nn
)2)1/2
Examples of completely random measures
• Generalized gamma process by Brix (1999), γ ∈ [0, 1), θ ≥ 0
• Superposed gamma process by Regazzini et al. (2003), η ∈ N• Stable-beta process by Teh and Gorur (2009), σ ∈ [0, 1), c > −σ
7/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Moment matching
Relative error index eM Moment-based index `M
M
e M
0 5 10 15 20 25
0
.1
.2
.3 ●
●
●
●
●
●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
GG : a = 1
γ = 0γ = 0.25γ = 0.5γ = 0.75
M
l M
0 5 10 15 20 25
0
.2
.4
.6
●
●
●
●
●
●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
GG : a = 1
γ = 0γ = 0.25γ = 0.5γ = 0.75
8/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Evaluation of error on functionals
Functional of interest: the total mass, criterion ∆1 = |µ(X)− µM(X)|Define similarly ∆k for higher order moments of the total mass
●
eM
l M
●
●
●
0.002 0.005 0.010 0.020 0.050 0.100 0.200
0.04
0.06
0.08
0.10
0.12∆1 = 0.1
∆1 = 0.05
∆4 = 0.1
∆4 = 0.05
γ = 0γ = 0.25γ = 0.5γ = 0.75
γ = 0γ = 0.25γ = 0.5γ = 0.75
9/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Moment matching
Reverse moment index M(`) = M ↔ `M = `Number of jumps M needed to achieve a given precision, here of ` = 10%
0
1
2
0.0
0.5
0
50
100
150
M(l )
aγ
10/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Posterior moment match
Theorem (James et al., 2009, Teh and Gorur, 2009)
In mixture models with normalized generalized gamma (left) and the Indianbuffet process based on the stable beta process (right) the posteriordistribution of µ is essentially (conditional on some latent variables)
µ∗ +k∑
j=1
J∗j δY∗j
M
l M
0 5 10 15 20 25
0
.2
.4
.6
●
●
●
●
●
●●
●●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
IG : γ = 0.5, a = 1
Priork = 1k = 3k = 10
M
l M
0 5 10 15 20 25
0
.2
.4
.6
●
●
●
●
●
●
●
●
●●
●●
●●
●● ● ● ● ● ● ● ● ● ●
●
SBP: σ = 0.5, a = 1, c = 1
Priorn = 5n = 10n = 20
11/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Posterior moment match
Posterior of inverse-Gaussian process:µ∗ +
∑kj=1 J
∗j δY∗
j
E
(k∑
j=1
J∗j
)/E(µ∗(X)
)
k\n 10 30 100
1 3.34 7.30 13.50
nγ 2.65 4.68 6.05
n 0.89 0.98 0.99 50
100
50
100
0
5
10
nk
Theorem (Arbel, De Blasi, Prunster, 2015)
Denote by P0 the true data distribution. In the NRMI model with prior guessP∗, the posterior of P converges weakly to P∞:
• if P0 is discrete, then P∞ = P0
• if P0 is diffuse, then P∞ = σP∗ + (1− σ)P0
12/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Bounding TM in probability
Proposition (Arbel and Prunster, 2016, Brix, 1999)
Let TM be the truncation error for the Generalized Gamma or the Stable BetaProcess.
Then for any ε ∈ (0, 1),
P(TM ≤ tεM
)≥ 1− ε
for
tεM .
e−CM if σ = 0,
1
M1/σ−1 if σ 6= 0,
with ugly explicit constantsdepending on ε, γ, θ, σ and c
M
t~ Mε
0 5 10 15 20 25
0
2
4
6
8●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●●
●● ●
●
SBP: a = 1, c = 1
σ = 0σ = 0.5
13/15
Completely random measures Ferguson & Klass moments algo Simulation study Theoretical results Application to density estimation
Density estimation
Mixtures of normalized random measures with independent increments
Yi |µi , σiind∼ k(·|µi , σi ), i = 1, . . . , n,
(µi , σi )|Piid∼ P, i = 1, . . . , n,
P ∼ NRMI,
Galaxy dataset. Kolmogorov–Smirnov distance dKS(F`M , FeM ) between
estimated cdfs F`M and FeM under, respectively, the moment-match (with`M = 0.01) and the relative error (with eM = 0.1, 0.05, 0.01) criteria.
γ eM = 0.1 eM = 0.05 eM = 0.01
0 19.4 15.5 9.2
0.25 31.3 23.7 15.1
0.5 42.4 28.9 18.3
0.75 64.8 41.0 23.2
14/15
References
Discussion
• Methodology based on moments for assessing quality of approximation inFerguson and Klass algorithm, a conditional algorithm
• Should be preferred to relative error
• All-purpose criterion: validates the samples of a CRM rather than atransformation of it
• Going to be included in a new release of BNPdensity R package
• Future work: compare L1 type bounds (Ishwaran and James, 2001) in theFerguson & Klass context and in size biased settings (see the review byCampbell et al., 2016)
For more details and for extensive numerical illustrations:
A. and Prunster (2016). A moment-matching Ferguson and Klassalgorithm. Statistics and Computing. arXiv:1606.02566
15/15
References
References I
Julyan Arbel and Igor Prunster (2016). A moment-matching Ferguson and Klass algorithm.Statistics and Computing.
Anders Brix (1999). Generalized gamma measures and shot-noise Cox processes. Advances inApplied Probability, pages 929–953.
Trevor Campbell, Jonathan H Huggins, Jonathan How, and Tamara Broderick (2016). Truncatedrandom measures. arXiv preprint arXiv:1603.00861.
Thomas S Ferguson and Michael J Klass (1972). A representation of independent incrementprocesses without Gaussian components. The Annals of Mathematical Statistics,43(5):1634–1643.
H. Ishwaran and L.F. James (2001). Gibbs sampling methods for stick-breaking priors. Journal ofthe American Statistical Association, 96(453):161–173.
Michael I Jordan (2010). Hierarchical models, nested models and completely random measures.Frontiers of Statistical Decision Making and Bayesian Analysis: in Honor of James O.Berger. New York: Springer.
John Kingman (1967). Completely random measures. Pacific Journal of Mathematics,21(1):59–78.
Antonio Lijoi and Igor Prunster (2010). Models beyond the Dirichlet process. Bayesiannonparametrics, 28:80.
Eugenio Regazzini, Antonio Lijoi, and Igor Prunster (2003). Distributional results for means ofnormalized random measures with independent increments. Ann. Statist., 31(2):560–585.
Yee W. Teh and Dilan Gorur (2009). Indian buffet processes with power-law behavior. InY. Bengio, D. Schuurmans, J.D. Lafferty, C.K.I. Williams, and A. Culotta, editors, Advances inNeural Information Processing Systems 22, pages 1838–1846. Curran Associates, Inc.