52
Tempered Variational Bayes ELBO maximization Consistency of VB Consistency of Variational Inference Badr-Eddine Chérief-Abdellatif Under the supervision of Pierre Alquier Second-year PhD students team days Ecole Doctorale Mathématiques Hadamard 20 May 2019 Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Consistency of Variational Inference

Badr-Eddine Chérief-AbdellatifUnder the supervision of Pierre Alquier

Second-year PhD students team daysEcole Doctorale Mathématiques Hadamard

20 May 2019

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 2: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Outline of the talk

1 Tempered Variational BayesTempered posteriorsVariational Bayes

2 ELBO maximizationMixture modelsModel selection

3 Consistency of VBTheoretical resultsEfficient algorithms

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 3: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

1 Tempered Variational BayesTempered posteriorsVariational Bayes

2 ELBO maximizationMixture modelsModel selection

3 Consistency of VBTheoretical resultsEfficient algorithms

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 4: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Notations

Assume that we observe X1, . . . , Xn i.i.d from P0 in a modelM = {Pθ, θ ∈ Θ} associated with a likelihood Ln. We define aprior π on Θ.

The posterior

πn(dθ) ∝ Ln(θ)π(dθ).

The tempered posterior - 0 < α < 1

πn,α(dθ) ∝ [Ln(θ)]απ(dθ).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 5: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Notations

Assume that we observe X1, . . . , Xn i.i.d from P0 in a modelM = {Pθ, θ ∈ Θ} associated with a likelihood Ln. We define aprior π on Θ.

The posterior

πn(dθ) ∝ Ln(θ)π(dθ).

The tempered posterior - 0 < α < 1

πn,α(dθ) ∝ [Ln(θ)]απ(dθ).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 6: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Notations

Assume that we observe X1, . . . , Xn i.i.d from P0 in a modelM = {Pθ, θ ∈ Θ} associated with a likelihood Ln. We define aprior π on Θ.

The posterior

πn(dθ) ∝ Ln(θ)π(dθ).

The tempered posterior - 0 < α < 1

πn,α(dθ) ∝ [Ln(θ)]απ(dθ).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 7: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Various reasons to use a tempered posterior

Easier to sample from

G. Behrens, N. Friel & M. Hurn. (2012). Tuning tempered transitions. Statistics and Computing.

Robust to model misspecification

P. Grünwald and T. Van Ommen (2017). Inconsistency of Bayesian Inference for MisspecifiedLinear Models, and a Proposal for Repairing It. Bayesian Analysis.

Theoretical analysis easier

A. Bhattacharya, D. Pati & Y. Yang (2016). Bayesian fractional posteriors. Preprintarxiv :1611.01125.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 8: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Various reasons to use a tempered posterior

Easier to sample from

G. Behrens, N. Friel & M. Hurn. (2012). Tuning tempered transitions. Statistics and Computing.

Robust to model misspecification

P. Grünwald and T. Van Ommen (2017). Inconsistency of Bayesian Inference for MisspecifiedLinear Models, and a Proposal for Repairing It. Bayesian Analysis.

Theoretical analysis easier

A. Bhattacharya, D. Pati & Y. Yang (2016). Bayesian fractional posteriors. Preprintarxiv :1611.01125.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 9: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Various reasons to use a tempered posterior

Easier to sample from

G. Behrens, N. Friel & M. Hurn. (2012). Tuning tempered transitions. Statistics and Computing.

Robust to model misspecification

P. Grünwald and T. Van Ommen (2017). Inconsistency of Bayesian Inference for MisspecifiedLinear Models, and a Proposal for Repairing It. Bayesian Analysis.

Theoretical analysis easier

A. Bhattacharya, D. Pati & Y. Yang (2016). Bayesian fractional posteriors. Preprintarxiv :1611.01125.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 10: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

1 Tempered Variational BayesTempered posteriorsVariational Bayes

2 ELBO maximizationMixture modelsModel selection

3 Consistency of VBTheoretical resultsEfficient algorithms

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 11: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Variational Bayes

π̃n,α = arg minρ∈FK(ρ, πn,α)

= arg maxρ∈F

∫`n(θ)ρ(dθ)−K(ρ, π)

}.

Examples :parametric approximation

F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+

d

}.

mean-field approximation, Θ = Θ1 ×Θ2 and

F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 12: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Variational Bayes

π̃n,α = arg minρ∈FK(ρ, πn,α)

= arg maxρ∈F

∫`n(θ)ρ(dθ)−K(ρ, π)

}.

Examples :

parametric approximation

F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+

d

}.

mean-field approximation, Θ = Θ1 ×Θ2 and

F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 13: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Variational Bayes

π̃n,α = arg minρ∈FK(ρ, πn,α)

= arg maxρ∈F

∫`n(θ)ρ(dθ)−K(ρ, π)

}.

Examples :parametric approximation

F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+

d

}.

mean-field approximation, Θ = Θ1 ×Θ2 and

F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 14: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Tempered posteriorsVariational Bayes

Variational Bayes

π̃n,α = arg minρ∈FK(ρ, πn,α)

= arg maxρ∈F

∫`n(θ)ρ(dθ)−K(ρ, π)

}.

Examples :parametric approximation

F ={N (µ,Σ) : µ ∈ Rd ,Σ ∈ S+

d

}.

mean-field approximation, Θ = Θ1 ×Θ2 and

F = {ρ : ρ(dθ) = ρ1(dθ1)× ρ2(dθ2)} .

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 15: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

1 Tempered Variational BayesTempered posteriorsVariational Bayes

2 ELBO maximizationMixture modelsModel selection

3 Consistency of VBTheoretical resultsEfficient algorithms

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 16: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Mixture models

Mixture models

Pθ = Pp,θ1,...,θK =∑K

j=1 pjqθj ,

prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.

Tempered posterior :

Ln(θ)απ(θ) ∝

(n∏

i=1

K∑j=1

pjqθj (Xi)

πp(p)K∏j=1

πθ(θj).

Variational approximation :

π̃n,α(p, θ) = ρp(p)K∏j=1

ρj(θj).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 17: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Mixture models

Mixture models

Pθ = Pp,θ1,...,θK =∑K

j=1 pjqθj ,prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.

Tempered posterior :

Ln(θ)απ(θ) ∝

(n∏

i=1

K∑j=1

pjqθj (Xi)

πp(p)K∏j=1

πθ(θj).

Variational approximation :

π̃n,α(p, θ) = ρp(p)K∏j=1

ρj(θj).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 18: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Mixture models

Mixture models

Pθ = Pp,θ1,...,θK =∑K

j=1 pjqθj ,prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.

Tempered posterior :

Ln(θ)απ(θ) ∝

(n∏

i=1

K∑j=1

pjqθj (Xi)

πp(p)K∏j=1

πθ(θj).

Variational approximation :

π̃n,α(p, θ) = ρp(p)K∏j=1

ρj(θj).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 19: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Mixture models

Mixture models

Pθ = Pp,θ1,...,θK =∑K

j=1 pjqθj ,prior π : p = (p1, . . . , pK ) ∼ πp = D(α1, . . . , αK ) and theθj ’s are independent from πθ.

Tempered posterior :

Ln(θ)απ(θ) ∝

(n∏

i=1

K∑j=1

pjqθj (Xi)

πp(p)K∏j=1

πθ(θj).

Variational approximation :

π̃n,α(p, θ) = ρp(p)K∏j=1

ρj(θj).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 20: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

ELBO maximization for mixtures

Optimization program

minρ=(ρp ,ρ1,...,ρK )

{− α

n∑i=1

∫log

( K∑j=1

pjqθj (Xi)

)ρ(dθ)

+K(ρp, πp

)+

K∑j=1

K(ρj , πj

)}

− log

( K∑j=1

pjqθj (Xi)

)= min

ωi∈SK

{−

K∑j=1

ωij log(pjqθj (Xi))

+K∑j=1

ωij log(ωi

j )

}Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 21: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Coordinate Descent algorithm

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 22: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Numerical example on Gaussian mixtures

B.-E. Chérief-Abdellatif & P. Alquier (2018). Consistency of Variational Bayes Inference forEstimation and Model Selection in Mixtures. Electronic Journal of Statistics.

Gaussian mixture∑3

j=1 pjN (θj , 1) and Gaussian prior on θj .Sample size n = 1000, we report the MAE over 10 replications.

Algo. p θ1 θ2 θ3

VBα=0.5 0.03 (0.02) 0.14 (0.30) 0.38 (1.11) 0.05 (0.05)VBα=1 0.03 (0.02) 0.14 (0.21) 0.36 (0.97) 0.06 (0.04)EM 0.03 (0.02) 0.14 (0.22) 0.36 (0.97) 0.06 (0.05)

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 23: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Numerical example on Gaussian mixtures

B.-E. Chérief-Abdellatif & P. Alquier (2018). Consistency of Variational Bayes Inference forEstimation and Model Selection in Mixtures. Electronic Journal of Statistics.

Gaussian mixture∑3

j=1 pjN (θj , 1) and Gaussian prior on θj .Sample size n = 1000, we report the MAE over 10 replications.

Algo. p θ1 θ2 θ3

VBα=0.5 0.03 (0.02) 0.14 (0.30) 0.38 (1.11) 0.05 (0.05)VBα=1 0.03 (0.02) 0.14 (0.21) 0.36 (0.97) 0.06 (0.04)EM 0.03 (0.02) 0.14 (0.22) 0.36 (0.97) 0.06 (0.05)

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 24: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

1 Tempered Variational BayesTempered posteriorsVariational Bayes

2 ELBO maximizationMixture modelsModel selection

3 Consistency of VBTheoretical resultsEfficient algorithms

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 25: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Model selection

D. Blei, A. Kucukelbir & J. McAuliffe. Variational inference : A review for statisticians. JASA,2017.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 26: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Model selection

Assume that we have a countable number of models, defineπ̃Kn,α a variational approximation of the tempered posterior in

model K :

ELBO maximization program

π̃Kn,α = arg max

ρK∈FK

∫`n(θK )ρK (dθK )−K

(ρK , πK

)}ELBO

ELBO(K ) = α

∫`n(θK )π̃K

n,α(dθK )−K(π̃Kn,α, πK )

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 27: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Model selection

Assume that we have a countable number of models, defineπ̃Kn,α a variational approximation of the tempered posterior in

model K :

ELBO maximization program

π̃Kn,α = arg max

ρK∈FK

∫`n(θK )ρK (dθK )−K

(ρK , πK

)}

ELBO

ELBO(K ) = α

∫`n(θK )π̃K

n,α(dθK )−K(π̃Kn,α, πK )

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 28: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

Model selection

Assume that we have a countable number of models, defineπ̃Kn,α a variational approximation of the tempered posterior in

model K :

ELBO maximization program

π̃Kn,α = arg max

ρK∈FK

∫`n(θK )ρK (dθK )−K

(ρK , πK

)}ELBO

ELBO(K ) = α

∫`n(θK )π̃K

n,α(dθK )−K(π̃Kn,α, πK )

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 29: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

ELBO criterion

Model selection criterion

K̂ = arg maxK≥1

{ELBO(K )− log

(1bK

)}

B.-E. Chérief-Abdellatif. Consistency of ELBO maximization for model selection. Proceedings ofAABI 2018.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 30: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Mixture modelsModel selection

ELBO criterion

Model selection criterion

K̂ = arg maxK≥1

{ELBO(K )− log

(1bK

)}

B.-E. Chérief-Abdellatif. Consistency of ELBO maximization for model selection. Proceedings ofAABI 2018.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 31: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

1 Tempered Variational BayesTempered posteriorsVariational Bayes

2 ELBO maximizationMixture modelsModel selection

3 Consistency of VBTheoretical resultsEfficient algorithms

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 32: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Technical condition for posterior concentrationIf the model is well-specified (∃θ0 ∈ Θ, Pθ0 = P0) :

Prior mass condition for concentration of tempered posteriorsThe rate (rn) is such that

π[B(rn)] ≥ e−nrn

where B(r) = {θ ∈ Θ : K(Pθ0 ,Pθ) ≤ r}.

Prior mass condition for concentration of Variational BayesThe rate (rn) is such that there exists ρn ∈ F such that∫

K(Pθ0 ,Pθ)ρn(dθ) ≤ rn, and K(ρn, π) ≤ nrn.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 33: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Technical condition for posterior concentrationIf the model is well-specified (∃θ0 ∈ Θ, Pθ0 = P0) :

Prior mass condition for concentration of tempered posteriorsThe rate (rn) is such that

π[B(rn)] ≥ e−nrn

where B(r) = {θ ∈ Θ : K(Pθ0 ,Pθ) ≤ r}.

Prior mass condition for concentration of Variational BayesThe rate (rn) is such that there exists ρn ∈ F such that∫

K(Pθ0 ,Pθ)ρn(dθ) ≤ rn, and K(ρn, π) ≤ nrn.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 34: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Technical condition for posterior concentrationIf the model is well-specified (∃θ0 ∈ Θ, Pθ0 = P0) :

Prior mass condition for concentration of tempered posteriorsThe rate (rn) is such that

π[B(rn)] ≥ e−nrn

where B(r) = {θ ∈ Θ : K(Pθ0 ,Pθ) ≤ r}.

Prior mass condition for concentration of Variational BayesThe rate (rn) is such that there exists ρn ∈ F such that∫

K(Pθ0 ,Pθ)ρn(dθ) ≤ rn, and K(ρn, π) ≤ nrn.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 35: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Consistency of the variational approximation

P. Alquier & J. Ridgway (2017). Concentration of Tempered Posteriors and of their VariationalApproximations. Preprint arxiv :1706.09293.

Theorem (Alquier, Ridgway)

Under the prior mass condition, for any α ∈ (0, 1),

E[ ∫

Dα(Pθ,P0)π̃n,α(dθ)

]≤ 1 + α

1− αrn.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 36: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Consistency for mixture models

B.-E. Chérief-Abdellatif, P. Alquier. Consistency of Variational Bayes Inference for Estimation andModel Selection in Mixtures. Electronic Journal of Statistics, 2018.

Theorem (C.-A., Alquier)

Chose 2K≤ αj ≤ 1 and assume that estimation in (qθ)

(without mixture) at rate rn. Then

E[∫

Dα(Pp,θ1,...,θK ,Pp0,θ01 ,...,θ0K

)π̃n,α(dθ)

]≤ 1 + α

1− α2Krn.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 37: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Consistency of the true approximation

P. Alquier & J. Ridgway (2017). Concentration of Tempered Posteriors and of their VariationalApproximations. Preprint arxiv :1706.09293.

Theorem (Alquier, Ridgway)

If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0), thenunder the prior mass condition, for any α ∈ (0, 1),

E[ ∫

Dα(Pθ,P0)π̃K0

n,α(dθ)

]≤ 1 + α

1− αrn.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 38: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Consistency of the selected approximation

B.-E. Chérief-Abdellatif. Consistency of ELBO maximization for model selection. Proceedings ofAABI 2018.

Theorem (C.-A.)

If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0), thenunder the prior mass condition, for any α ∈ (0, 1),

E[ ∫

Dα(Pθ,P0)π̃K̂

n,α(dθ)

]≤ 1 + α

1− αrn +

log( 1bK0

)

n(1− α).

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 39: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Robustness to misspecification (Gaussian mixtures)The true distribution P0 is such that E|X | < +∞.Let L ≥ 1, bK = 2−K , πK = DK (α1, . . . , αK )

⊗N (0,V2)⊗n and

rn,K =

[8K log(nK )

n

∨(8K log(nV)

n+

8KL2

nV2

)]+

K log(2)

n(1− α).

TheoremFor any α ∈ (0, 1),

E[ ∫

(Pθ,P

0)π̃K̂n,α(dθ)

]≤ inf

K≥0

1− αinf

θ∗∈SK×[−L,L]KK(P0,Pθ∗) +

1 + α

1− αrn,K

}.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 40: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Applications

If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0) :

Gaussian mixtures : θ = (p, (m1, σ21), ..., (mK , σ

2K ))

E[ ∫

Dα(Pθ,P0)π̃K̂

n,α(dθ)

]= O

(K0 log(nK0)

n

)

Probabilistic PCA : θ ∈ Rd×K

E[ ∫

Dα(Pθ,P0)π̃K̂

n,α(dθ)

]= O

(dK0 log(dn)

n

)

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 41: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Applications

If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0) :

Gaussian mixtures : θ = (p, (m1, σ21), ..., (mK , σ

2K ))

E[ ∫

Dα(Pθ,P0)π̃K̂

n,α(dθ)

]= O

(K0 log(nK0)

n

)

Probabilistic PCA : θ ∈ Rd×K

E[ ∫

Dα(Pθ,P0)π̃K̂

n,α(dθ)

]= O

(dK0 log(dn)

n

)

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 42: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Applications

If there is a true model (∃K0, ∃θ0 ∈ ΘK0 , Pθ0 = P0) :

Gaussian mixtures : θ = (p, (m1, σ21), ..., (mK , σ

2K ))

E[ ∫

Dα(Pθ,P0)π̃K̂

n,α(dθ)

]= O

(K0 log(nK0)

n

)

Probabilistic PCA : θ ∈ Rd×K

E[ ∫

Dα(Pθ,P0)π̃K̂

n,α(dθ)

]= O

(dK0 log(dn)

n

)

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 43: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

1 Tempered Variational BayesTempered posteriorsVariational Bayes

2 ELBO maximizationMixture modelsModel selection

3 Consistency of VBTheoretical resultsEfficient algorithms

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 44: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Efficient algorithms

QuestionAre there efficient algorithms to (provably) compute π̃n,α ?

B.-E. Chérief-Abdellatif, P. Alquier & M. E. Khan. A Generalization Bound for Online VariationalInference. Preprint arXiv, 2018.

Parametric variational approximation :

F = {qµ, µ ∈ M} .

Objective : propose a way to update µt → µt+1 so that qµtleads to similar performances as the tempered posterior...

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 45: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Efficient algorithms

QuestionAre there efficient algorithms to (provably) compute π̃n,α ?

B.-E. Chérief-Abdellatif, P. Alquier & M. E. Khan. A Generalization Bound for Online VariationalInference. Preprint arXiv, 2018.

Parametric variational approximation :

F = {qµ, µ ∈ M} .

Objective : propose a way to update µt → µt+1 so that qµtleads to similar performances as the tempered posterior...

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 46: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Efficient algorithms

QuestionAre there efficient algorithms to (provably) compute π̃n,α ?

B.-E. Chérief-Abdellatif, P. Alquier & M. E. Khan. A Generalization Bound for Online VariationalInference. Preprint arXiv, 2018.

Parametric variational approximation :

F = {qµ, µ ∈ M} .

Objective : propose a way to update µt → µt+1 so that qµtleads to similar performances as the tempered posterior...

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 47: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Some online strategiesAlgorithm 3 SVA (Sequential Variational Approximation)1: for t = 1, 2, . . . do2: θt = Eθ∼qµt

[θ],3: xt revealed, update

µt+1 = arg minµ∈M

[µT∇µ

t∑i=1

Eθ∼qµ[− log pθ(xi)] +K(qµ, π)

α

].

4: end for

SVB (Streaming Variational Bayes) has update

µt+1 = arg minµ∈M

[µT∇µEθ∼qµ[− log pθ(xt)] +

K(qµ, qµt )

α

].

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 48: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Some online strategiesAlgorithm 3 SVA (Sequential Variational Approximation)1: for t = 1, 2, . . . do2: θt = Eθ∼qµt

[θ],3: xt revealed, update

µt+1 = arg minµ∈M

[µT∇µ

t∑i=1

Eθ∼qµ[− log pθ(xi)] +K(qµ, π)

α

].

4: end for

SVB (Streaming Variational Bayes) has update

µt+1 = arg minµ∈M

[µT∇µEθ∼qµ[− log pθ(xt)] +

K(qµ, qµt )

α

].

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 49: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

A regret bound for SVA

Theorem (C.-A., Alquier & Khan)

Assume that µ 7→ Eθ∼qµ[− log pθ(xt)] is L-Lipschitz andconvex.

Assume that µ 7→ K(pµ, π) is γ-strongly convex.Then SVA satisfies :

T∑t=1

[− log pθt (xt)]

≤ infµ∈M

{Eθ∼qµ

[T∑t=1

[− log pθ(xt)]

]+αL2T

γ+K(qµ, π)

α

}.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 50: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

A regret bound for SVA

Theorem (C.-A., Alquier & Khan)

Assume that µ 7→ Eθ∼qµ[− log pθ(xt)] is L-Lipschitz andconvex. (this is for example the case as soon as thelog-likelihood is concave in θ and L-Lipschitz, and µ is alocation-scale parameter).

Assume that µ 7→ K(pµ, π) isγ-strongly convex. Then SVA satisfies :

T∑t=1

[− log pθt (xt)]

≤ infµ∈M

{Eθ∼qµ

[T∑t=1

[− log pθ(xt)]

]+αL2T

γ+K(qµ, π)

α

}.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 51: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

A regret bound for SVA

Theorem (C.-A., Alquier & Khan)

Assume that µ 7→ Eθ∼qµ[− log pθ(xt)] is L-Lipschitz andconvex. Assume that µ 7→ K(pµ, π) is γ-strongly convex.Then SVA satisfies :

T∑t=1

[− log pθt (xt)]

≤ infµ∈M

{Eθ∼qµ

[T∑t=1

[− log pθ(xt)]

]+αL2T

γ+K(qµ, π)

α

}.

Badr-Eddine Chérief-Abdellatif Consistency of variational inference

Page 52: Consistency of Variational Inference · TemperedVariationalBayes ELBOmaximization ConsistencyofVB ConsistencyofVariationalInference Badr-EddineChérief-Abdellatif Under the supervision

Tempered Variational BayesELBO maximizationConsistency of VB

Theoretical resultsEfficient algorithms

Thank you !

Badr-Eddine Chérief-Abdellatif Consistency of variational inference