Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Projected Stein variational Newton:A fast and scalable Bayesian inference method in high dimensions
Peng ChenKeyi Wu, Joshua Chen, Thomas OLeary-Roseberry, Omar Ghattas
Oden Institute for Computational Engineering and SciencesThe University of Texas at Austin
RICAM Workshop on Optimization and Inversion under Uncertainty
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 1 / 65
Example: inversion in Antarctica ice sheet flow
uncertain parameter: basal sliding field in boundary conditionforward model: viscous, shear-thinning, incompressible fluid
−∇ · (η(u)(∇u +∇uT)− Ip) = ρg∇ · u = 0
data: (InSAR) satellite observation of surface ice flow velocity
T. Isaac, N. Petra, G. Stadler, O. Ghattas, JCP, 2015Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 2 / 65
Outline
1 Bayesian inversion
2 Stein variational methods
3 Projected Stein variational methods
4 Stein variational reduced basis methods
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 3 / 65
Outline
1 Bayesian inversion
2 Stein variational methods
3 Projected Stein variational methods
4 Stein variational reduced basis methods
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 4 / 65
Uncertainty parametrization
Example I: Karhunen–Loève expansionKarhunen–Loève expansion for β with mean β̄ and covariance C
β(x, θ) = β̄(x) +∑j≥1
√λjψj(x) θj,
(λj, ψj)j≥1: eigenpairs of a covariance C, θ = (θj)j≥1, uncorrelated, given by
θj =1√λj
∫D(κ− κ̄)ψj(x)dx.
Example II: dictionary basis representationWe can approximate the random field β by
β(x, θ) =∑j≥1
ψj(x) θj,
(ψj)j≥1 dictionary basis, e.g., wavelet or finite element basis.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 5 / 65
Bayesian inversion
We consider an abstract form of the parameter to data model
y = O(θ) + ξuncertain parameter: θ ∈ Θ ⊂ Rd
parameter-to-observable map Oobservation data: y ∈ Rn
noise ξ, e.g., ξ ∼ N (0,Γ)Bayes’ rule:
πy(θ)︸ ︷︷ ︸posterior
=1
π(y)π(y|θ)︸ ︷︷ ︸
likelihood
π0(θ)︸ ︷︷ ︸prior
,
with the model evidence
π(y) =∫
Θπ(y|θ)π0(θ)dθ.
Parameter θ
QoI s(θ) Foward ModelA(u , v, θ) = F(v)
Bayesian Inversionπy(θ) ∝ π(y |θ)π0(θ)
Observational Datay = ℬu + ξ
prior
posterior
π(y |θ) = πξ(y −ℬu )
$(θ) = ℬu (θ)
The central tasks: sample from posterior and compute statistics, e.g.,
Eπy [s] =∫
Θs(θ)πy(θ)dθ.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 6 / 65
Computational challenges
Computational challenges for Bayesian inversion:the posterior has complex geometry:non-Gaussian, multimodal, concentrating in a local regionthe parameter lives in high-dimensional spacescurse of dimensionality – complexity grows exponentiallythe map O is expensive to evaluate:involving solve of large-scale partial differential equations
complex geometry high dimensionality large-scale computationPeng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 7 / 65
Computational methods
Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,
Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...
2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...
3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...
Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)
[Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)
[Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...
3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65
Computational methods
Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,
Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...
2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...
3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...
Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)
[Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)
[Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...
3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65
Computational methods
Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,
Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...
2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...
3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...
Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)
[Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)
[Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...
3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65
Computational methods
Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,
Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...
2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...
3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...
Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)
[Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)
[Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...
3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65
Computational methods
Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,
Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...
2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...
3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...
Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)
[Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)
[Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...
3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65
Computational methods
Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,
Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...
2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...
3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...
Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)
[Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)
[Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...
3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65
Computational methods
Towards better design of MCMC to reduce # samples1 Langevin and Hamiltonian MCMC (local geometry using gradient,
Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011,Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016,Beskos et al., 2017]...
2 dimension reduction MCMC (intrinsic low-dimensionality) [Cui et.al., 2014, 2016, Constantine et. al., 2016]...
3 randomized/optimized MCMC (optimization for sampling)[Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...
Direct posterior construction and statistical computation1 Laplace approximation (Gaussian posterior approximation)
[Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...2 deterministic quadrature (sparse Smolyak, high-order quasi-MC)
[Schillings and Schwab, 2013, Gantner and Schwab, 2016,Chen and Schwab, 2016, Chen et al., 2017]...
3 transport maps (polynomials, radial basis functions, deep neuralnetworks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018,Rezende and Mohamed, 2015, Liu and Wang, 2016,Detommaso et al., 2018, Chen et al., 2019]...
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65
Computational methods
Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic
collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...
2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...
3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...
Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using
projected transport map in parameter spacereduced basis approximation in state space
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65
Computational methods
Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic
collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...
2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...
3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...
Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using
projected transport map in parameter spacereduced basis approximation in state space
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65
Computational methods
Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic
collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...
2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...
3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...
Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using
projected transport map in parameter spacereduced basis approximation in state space
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65
Computational methods
Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic
collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...
2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...
3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...
Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using
projected transport map in parameter spacereduced basis approximation in state space
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65
Computational methods
Surrogate models to reduce the large-scale computation1 polynomial approximation (stochastic spectral, stochastic
collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009,Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...
2 model reduction (POD, greedy reduced basis)[Wang and Zabaras, 2005, Lieberman et al., 2010,Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015,Chen and Schwab, 2016, Chen and Ghattas, 2019]...
3 multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et.al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,Peherstorfer. et. al., 2018, Farcas et. al., 2019]...
Aim for this talk:Fast and scalable Bayesian inference in high dimensions by exploitingintrinsic low-dimensionality in both parameter and state spaces, using
projected transport map in parameter spacereduced basis approximation in state space
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65
Outline
1 Bayesian inversion
2 Stein variational methods
3 Projected Stein variational methods
4 Stein variational reduced basis methods
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 10 / 65
Transport map
Find a transport map T : Rd → Rd, such that
θ ∼ π0 → T(θ) ∼ πy,
prior posterior
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 11 / 65
Kullback–Leibler divergence
Definition: Kullback–Leibler (KL) divergence
DKL(π1|π2) =∫
Θπ1(θ) log
(π1(θ)
π2(θ)
)dθ.
It measures the difference between two probability distribution
DKL(π1|π2) ≥ 0, and DKL(π1|π2) = 0 if and only if π1 = π2, a.e.
It is not symmetric, thus not a distance
DKL(π1|π2) 6= DKL(π2|π1)
Relation to (Shannon) information theory
DKL(π1|π2) = Eθ∼π1 [− log(π2)]︸ ︷︷ ︸cross entropy
−Eθ∼π1 [− log(π1)]︸ ︷︷ ︸entropy
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 12 / 65
Optimization for transport map
Find a transport map T : Rd → Rd, such that
θ ∼ π0 → T(θ) ∼ πy,
by minimizing the KL divergence
minT∈TDKL(T]π0|πy) ⇔ min
T∈TDKL(π0|T]πy).
T] is a pushforward map satisfying
T]π0(θ) = π0(T−1(θ))|det∇T−1(θ)|,
T] is a pullback map satisfying
T]πy(θ) = πy(T(θ))|det∇T(θ)|.
T is a tensor-product function space Hd = H⊗ · · · ⊗ H.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 13 / 65
Composition of transport map
Instead of seeking one complex (highly nonlinear) transport map T,we look for composition of a sequence of simple transport maps
T = TL ◦ TL−1 ◦ · · · ◦ T1 ◦ T0, L ∈ N,
perturbation of identity:
Tl(θ) = I(θ) + Ql(θ),
identity map I(θ) = θperturbation map Ql : Rd → Rd
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 14 / 65
Optimization of each transport map
At each l = 0, 1, . . . , we define
πl+1 := (Tl ◦ · · · ◦ T0)]π0 ⇐⇒ πl+1 = (Tl)]πlWe introduce a cost functional
Jl[Q] := DKL((I + Q)]πl|πy). (1)One step optimization of Jl(Q) w.r.t. Q leads to
Tl = I + αlQl,
with step size αl > 0 (learning rate, line search).
Optimization methodsGradient descent method: steepest descent [Liu and Wang, 2016]
Ql = −DJl[0].Newton method: solve the linear system [Detommaso et al., 2018]
D2Jl[0](V,Ql) = −DJl[0](V), ∀ V ∈ T .
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 15 / 65
Optimization of each transport map
Calculus of variationThe first variation DJl[0] at Q = 0 in direction V ∈ T
DJl[0](V) := −Eπl[(V(θ))>∇θ log(πy(θ)) + trace(∇θV(θ))
]The second variation D2Jl[0] at Q = 0 in directions V,W ∈ T
D2Jl[0](V,W) := −Eπl[(V(θ))>∇2θ log(πy(θ)) W(θ)− trace(∇θW(θ) ∇θV(θ))
]Recall the Bayes’ rule:
πy(θ)︸ ︷︷ ︸posterior
=1
π(y)π(y|θ)︸ ︷︷ ︸
likelihood
π0(θ)︸ ︷︷ ︸prior
,
which leads to
∇θ log(πy(θ)) =∇θπy(θ)πy(θ)
=∇θ(π(y|θ)π0(θ))π(y|θ)π0(θ)
Key observation: the intractable model evidence π(y) is canceled out.Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 16 / 65
Reproducing Kernel Hilbert Space (RKHS)
T is a tensor-product function space Hd = H⊗ · · · ⊗ H.
tensor-product polynomials[El Moselhy and Marzouk, 2012, Spantini et al., 2018],radial basis/kernel functions[Liu and Wang, 2016, Detommaso et al., 2018].
Reproducing Kernel Hilbert Space HThere exists a function kθ ∈ H for every θ ∈ Θ, such that
v(θ) = 〈v, kθ〉 ∀v ∈ H,
which implies existence of kθ′ ∈ H for every θ′ ∈ Θ such that
kθ′(θ) = 〈kθ′ , kθ〉 =: k(θ, θ′) reproducing kernel
Many choices: bilinear, polynomials, Bergman, radial basis functions
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 17 / 65
N-dimensional approximation of RKHS
Gaussian kernel
k(θ, θ′) = exp(− 1
2h(θ − θ′)>M(θ − θ′)
).
To account for the geometry of the posterior, [Detommaso et al., 2018]
M = H̄ := Eπl[−∇2θ log(πy(θ))
], h = d, v.s. M = I ∈ Rd×d.
Finite dimensional approximation of RKHS:
HlN = span(kl1(θ), . . . , klN(θ)) ⊂ H,
where the basis functions are taken as
kln(θ) = k(θ, θln), n = 1, . . . ,N,
where θln ∼ πl are particles transported from θ0n ∼ π0 by
θln = (Tl ◦ · · · ◦ T0)(θ0n), n = 1, . . . ,N.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 18 / 65
Stein variational gradient descent (SVGD)[Liu and Wang, 2016]
For DJl[0](V) = 〈DJl[0],V〉Hd , by the reproducing property
〈DJl[0],V〉Hd = −〈Eπl [∇θ log(πy(θ))k(θ, θ′) +∇θk(θ, θ′)],V(θ′)〉.
For gradient descent, we have (by notation kln(θ) = k(θ, θln))
Ql(θln) = −DJl[0](θln) = Eπl[∇θ log(πy(θ))kln(θ) +∇θkln(θ)
]Sample average approximation (SAA): θlm ∼ πl, m = 1, . . . ,N
Ql(θln) ≈1N
N∑m=1
∇θ log(πy(θlm))kln(θlm) +∇θkln(θlm).
Particle updates by the transport map
θl+1n = Tl(θln) := θ
ln + αl Ql(θ
ln), n = 1, . . . ,N.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 19 / 65
Stein variational Newton (SVN) [Detommaso et al., 2018]
We seek Ql ∈ T lN = (HlN)d, where HlN = span(kl1(θ), . . . , klN(θ)),
Ql(θ) =N∑
n=1
clnkln(θ),
where the coefficients cln ∈ Rd, with cl = (cl1, . . . , clN) ∈ RdN .
For the Newton system: find Ql ∈ T lN such that
D2Jl[0](V,Ql) = −DJl[0](V), ∀ V ∈ T lN ,
which, by using the reproducing property, becomes
Hcl = −gl,
gradient: gl = (gl1, . . . , glN) ∈ RdN , Hessian: H ∈ RdN×dN .
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 20 / 65
Stein variational Newton (SVN) [Detommaso et al., 2018]
The gradient gl = (gl1, . . . , glN) ∈ RdN , with glm ∈ Rd given by
glm = −1N
N∑i=1
∇θ log(πy(θli))klm(θli) +∇θklm(θli)
The Hessian H ∈ RdN×dN : with Hmn ∈ Rd×d given by
Hmn =1N
N∑i=1
−∇2θ log(πy(θli))klm(θli)kln(θli) +∇θklm(θli)(∇θkln(θli))>.
Decouple dN × dN system to N systems of size d × d
H̄mclm = −glm, m = 1, . . . ,N,
with diagonal approximation
H̄m =1N
N∑i=1
−∇2θ log(πy(θli))klm(θli)klm(θli) +∇θklm(θli)(∇θklm(θli))>.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 21 / 65
SVGD vs SVN with M = I vs M = H̄
G. Detommaso, T. Cui, Y. Marzouk, A. Spantini, R. Scheichl. A Stein variationalNewton method. NeurIPS, 2018.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 22 / 65
Outline
1 Bayesian inversion
2 Stein variational methods
3 Projected Stein variational methods
4 Stein variational reduced basis methods
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 23 / 65
Computational challenges in high dimensions
Curse of dimensionality: d � 1The number N of basis functions grows rapidly (exponentially) w.r.t. thedimension d to achieve map representation with required accuracy.
4 5 6 7 8 9 10Log2(Dimension)
0.8
0.6
0.4
0.2
0.0
0.2Lo
g10(
RMSE
of v
aria
nce)
SVGDSVNpSVN
P. Chen, K. Wu, J. Chen, T. O’Leary-Roseberry, O. Ghattas. Projected Steinvariational Newton: A fast and scalable Bayesian inference method in high
dimensions. NeurIPS, 2019.Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 24 / 65
Intrinsic low dimensionality
The posterior 6= the prior in a low-dimensional subspace.high correlation in different dimensions;forward map is smoothing/regularizing;parameters are anistropic, e.g., KL expansion.
100
101
102
103
j
10-6
10-4
10-2
100
102
√
λj
j−2(α = 2)
prior, case 1
posterior, case 1
rearranged, case 1
prior, case 2
posterior, case 2
rearranged, case 2
posterior, case 3
rearranged, case 3
P. Chen, U. Villa, O. Ghattas. Hessian-based adaptive sparse quadrature forinfinite-dimensional Bayesian inverse problems. CMAME, 2017.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 25 / 65
Projection [Constantine et. al. 2014, Cui et. al. 2014]
We denote a basis of the subspace of dimension r � d as
Ψ = (ψ1, . . . , ψr) ∈ Rd×r.
We project θ to the low-dimensional subspace as
θr =
r∑i=1
ψiψ>i θ = Ψw.
As a result, we consider the projected posterior
πry(θ) =1
πr(y)π(y|θr)π0(θ), (2)
where the maginal density
πr(y) = Eπ0 [π(y|θr)].
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 26 / 65
Projected Stein variational methods
By decomposition θ = θr + θ⊥, we have
πry(θ) = π(y|θr) π0(θr) π0(θ⊥|θr).
With θ⊥ frozen, by θr = Ψw, we define
p0(w) := π0(θr), py(w) := πry(θr) = π(y|θr)π0(θr).
We seek T = TL ◦ TL−1 ◦ · · · ◦ T1 ◦ T0 : Rr → Rr, such that
minT∈T
DKL(T]p0|py).
Apply SVGD/SVN in Rr for w, pSVGD/pSVN where
∇w log(py(w)) = Ψ>∇θ log(πry(θr)),
and∇2w log(py(w)) = Ψ>∇2θ log(πry(θr))Ψ.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 27 / 65
Basis construction
The basis functions Ψ for projection are obtained by
Hψi = λiC−10 ψi, i = 1, . . . , r,
which corresponds to the r largest (in | · |) eigenvalues, i.e.,|λ1| ≥ · · · ≥ |λr|. C0: prior covariance. With ηy(θ) = − log(π(y|θ))
Gradient-based subspace:
H = Eπ[∇θηy(θ)(∇θηy(θ))>
].
Hessian-based subspace:
H = Eπ[∇2θηy(θ)
].
Choice of the density π:density at step l, i.e., πl.
0 10 20 30 40i
1
0
1
2
3
4
Log1
0(|
i|)
# dimensions=289# dimensions=1,089# dimensions=4,225# dimensions=16,641
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 28 / 65
Algorithm 1 pSVN in parallel using MPI
1: Input: N prior samples, θ01, . . . , θ0N , in each of K cores.
2: Output: posterior samples θy1, . . . , θyN in each core.
3: Perform projection to get θn = θrn + θ⊥n and the samples wl−1n .4: Perform MPI Allgather for wl−1n , n = 1, . . . ,M.5: repeat6: Compute the gradient and Hessian.7: Perform MPI Allgather for the gradient and Hessian.8: Compute the kernel and its gradient.9: Assemble and solve Newton system for c1, . . . , cN .
10: Perform a line search to get wl1, . . . ,wlN .
11: Perform MPI Allgather for wln, n = 1, . . . ,N.12: Update the samples θrn = Ψwln + θ̄, n = 1, . . . ,N.13: Set l← l + 1.14: until A stopping criterion is met.15: Reconstruct samples θyn = θrn + θ⊥n , n = 1, . . . ,N.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 29 / 65
Algorithm 2 Adaptive pSVN
1: Input: N prior samples, θ01, . . . , θ0N , in each of K cores.
2: Output: posterior samples θy1, . . . , θyN in each core.
3: Set level l2 = 1, θl2−1n = θ0n, n = 1, . . . ,N.4: repeat5: Perform the eigendecomposition and form the bases Ψl2 .6: Apply Algorithm pSVN to update the samples
[θl21 , . . . , θl2N ] = pSVN([θ
l2−11 , . . . , θ
l2−1N ],K,Ψ
l2).7: Set l2 ← l2 + 1.8: until A stopping criterion is met.
Advantages:Avoids/alleviates the curse of dimensionality.Largely reduces computational cost with r � d.Converges faster in low-dimensional space.Parallel computation with reduced communication.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 30 / 65
Projection error estimates
Assumption
Assume that the parameter-to-observable map O satisfies:1 There exists a constant CO > 0 such that Eπ0 [||O(·)||Γ] ≤ CO.2 For every b > 0, there exists a constant Cb > 0 such that
||O(θ1)−O(θ2)||Γ ≤ Cb||θ1 − θ2||Θ, for max{||θ1||X, ||θ2||Θ} < b.
TheoremUnder Assumption 1, for Hessian-based projection, we have
DKL(πy |πry) ≤ C||θ − θr||Θ,
C independent of r. For gradient-based projection, based on a resultin [Zahm et. al., 2018], we obtain (with C independent of r)
DKL(πy |πry) ≤ Cd∑
i=r+1
λi.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 31 / 65
Numerical results: AccuracyWe first consider a linear parameter-to-observable map
O(θ) = Aθ, A = O(Bθ), B = (L + M)−1,where L and M are the discrete Laplacian and mass matrices in the PDEmodel −4u + u = θ, in (0, 1), u(0) = 0, u(1) = 1. Gaussian prior N (0,C0), C0is discretized from (I − 0.14)−1 with Laplace operator 4.
4 5 6 7 8 9 10Log2(Dimension)
0.8
0.6
0.4
0.2
0.0
0.2
Log1
0(RM
SE o
f var
ianc
e)
SVGDSVNpSVN
0 2 4 6 8# iterations
1.5
1.0
0.5
0.0
Log1
0(RM
SE o
f var
ianc
e)
SVGD N=32SVN N=32pSVN N=32SVGD N=512SVN N=512pSVN N=512
Decay of the RMSE of the L2 of pointwise variance of the parameter w.r.t.dimension d = 16, 64, 256, 1024 with N = 128 samples (left), and with N = 32,
and 512 samples in parameter dimension d = 256 w.r.t. # iterations (right).Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 32 / 65
Numerical results: AccuracyWe consider a nonlinear Bayesian inverse problem with
O(θ) = O(S(θ)), u = S(θ), −∇ · (eθ∇u) = 0, in (0, 1)2
Gaussian prior N (0,C0), where C0 is a discretization of (I − 0.14)−2. We testthe accuracy against a dimension-independent likelihood informed (DILI)MCMC method with 10,000 samples as reference.
0 5 10 15# iterations
1.2
1.0
0.8
0.6
0.4
0.2
0.0
Log1
0(RM
SE o
f mea
n)
SVN 32pSVN 32SVN 512pSVN 512
0 5 10 15# iterations
1.0
0.8
0.6
0.4
0.2
0.0
Log1
0(RM
SE o
f var
ianc
e)SVN 32pSVN 32SVN 512pSVN 512
Decay of the RMSE of the L2 of the mean (left) and pointwise variance (right)of the parameter with dimension d = 1089 and N = 32 and 512 samples.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 33 / 65
Numerical results: ScalabilityWe consider a nonlinear Bayesian inverse problem with
O(θ) = O(S(θ)), u = S(θ), −∇ · (eθ∇u) = 0, in (0, 1)2
Gaussian prior N (0,C0), where C0 is a discretization of (I − 0.14)−2. We testthe accuracy against a dimension-independent likelihood informed (DILI)MCMC method with 10,000 samples as reference.
2 4 6 8# iterations
2.0
1.5
1.0
0.5
0.0
Log1
0(st
ep n
orm
)
# samples=8# samples=32# samples=128# samples=512
0 1 2 3 4 5Log2(# processor cores)
2
0
2
4
6
8
10
Log2
(tim
e (s
))
totalvariationkernelsolvesampleO(N 1)
Left: Decay of the averaged norm of the update wl+1 − wl w.r.t. the iterationnumber l, with increasing number of samples. Right: Decay of the wall clock
time of different computational components w.r.t. increasing # cores.Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 34 / 65
Summary
Take away message:
SVN provides good samples for complex posterior.
pSVN is scalable to address the curse-of-dimensionality.
Ongoing:
Bayesian optimal experimental design with Keyi Wu.
Triangular map and data assimilation with Joshua Chen.
Deep learning for transport map with Tom O’Leary-Roseberry.
Gravitational wave inversion with Bassel Saleh, Alex Leviyev.
Integration with model reduction with Zihang Zhang.
Convergence analysis w.r.t. # particles, parameter dimensions.
Multilevel parallel implementation w.r.t. particles and PDE solves.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 35 / 65
Outline
1 Bayesian inversion
2 Stein variational methods
3 Projected Stein variational methods
4 Stein variational reduced basis methods
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 36 / 65
PDE-constrained Bayesian inversion
We have the data model
y = B(u(θ)) + ξ
where u is the solution of the PDE (in weak form)
A(u(θ), v; θ) = F(v) v ∈ V
B : V → Y is a vector of observational functionals.Examples: linear diffusion, elasiticity, Stokes flow, acoustic, etc.,
−∇ · (κ(θ)∇u) = f , in D,
with suitable boundary conditions, which leads to
A(u, v; θ) =∫
Dκ(x, θ)∇u(x, θ) · ∇v(x)dx, F(v) =
∫D
f (x)v(x)dx.
With Gaussian noise ξ ∈ N (0,Γ), we define the potential
ηy(θ) :=12
(y− B(u(θ)))TΓ−1(y− B(u(θ)))⇒ π(y|θ) = log(−ηy(θ)).
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 37 / 65
High-fidelity approximation of the potential ηyE.g. finite element, we consider: find uh ∈ Vh ⊂ V such that
A(uh, vh, θ) = F(vh) ∀vh ∈ Vh. (3)
Then the data model is given by
y = B(uh(θ)) + ξ,
then for ξ ∼ N (0,Γ) the likelihood function is given by
π(y|θ) = exp(−ηy(uh(θ))),
where the potential ηy(uh(θ)) (nonlinear w.r.t. uh)
ηy(uh(θ)) =12
(y− B(uh(θ)))TΓ−1(y− B(uh(θ))).
For SVGD, and the projected SVGD, we also need
−∇θ log(πy(θ)) = ∇θηy(uh(θ)) +∇θπ0(θ)π0(θ)
.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 38 / 65
High-fidelity approximation of the gradient ∇θηy
We form a Lagrangian
L(uh, ph, θ) = ηy(uh) + A(uh, ph, θ)− F(ph),
∂vLwh = 0 to obtain the adjoint ph, i.e., find ph ∈ V such that
A(wh, ph; θ) = −∂uηy|uh(wh) ∀wh ∈ Vh, (4)
where∂uηy|uh(wh) = −B(wh)
TΓ−1(y− B(uh)).
Then the gradient is given by
∇θηy(uh(θ)) = ∂θL(uh, ph; θ) = ∂θA(uh, ph, θ).
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 39 / 65
Model reductionHigh-fidelity approximationFinite element space Vh,
dim(Vh) = Nh
Given θ, find uh ∈ Vh s.t.
A(uh, vh; θ) = F(vh) ∀vh ∈ Vh
The algebraic system is
Ah(θ)uh = f h
V = [ψ1, . . . ,ψN ]
VT uh = uN
VTAh(θ)V = AN(θ)
VT f h = f N
Reduced basis approximationReduced basis space VN ⊂ Vh,
dim(VN) = N
Given θ, find uN ∈ VN s.t.
A(uN , vN ; θ) = F(vN) ∀vN ∈ VN
The algebraic system is
AN(θ)uN = f N
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 40 / 65
Model reduction: Building blocksPOD/SVDTraining samples
Ξt = {θn, n = 1, . . . ,Nt}
Compute snapshots
U = [uh(θ1), . . . , uh(θNt )]
Perform SVD
U = VΣWT
Extract bases V[1 : N, :]
N = argminn En(Σ) ≥ 1− ε
Greedy algorithmTraining samples
Ξt = {θn, n = 1, . . . ,Nt}
Initialize VN for N = 1 as
VN = span{uh(θ1)}
Pick next sample such that
θN+1 = argmaxθ∈Ξt ∆N(θ)
Update bases VN+1 as
VN ⊕ span{uh(θN+1)}
Offline-OnlineAffine assumption/approx.
A =∑Q
q=1 θq(θ)Aq
Offline computation once
AqN = VTAqhV
Online assemble
AN(θ) =∑Q
q=1 θq(θ)AqN
Online solve
AN(θ)uN = f N
Goal-oriented a-posteriori error estimate ∆N(θ) – dual weighted residual
∆N(θ) = A(uN , pN , θ)− F(pN)
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 41 / 65
Reduced basis approximation of the potential ηy
RB approximation for the adjoint problem: find pN ∈ WN s.t.
A(wN , pN , θ) = −∂uηy|uN (wN) ∀wN ∈ WN .
The goal-oriented a-posterior error estimate is given by
∆N(θ) = A(uN , pN , θ)− F(pN).
RB approximation for the potential ηy(θ):
ηy,N(θ) = ηy(uN(θ)).
Dual-weighted residual correction:
η∆y,N(θ) = ηy,N(θ) + ∆N(θ).
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 42 / 65
Reduced basis approximation of the gradient ∇θηyWith the RB state uN and adjoint pN , the gradient is given by
∇θηy(uN(θ)) = ∂θA(uN , pN ; θ).
For the modified potential η∆y,N(θ), we form the Lagrangian
L(uN , pN , ûN , p̂N ; θ) = η∆y,N(θ) + A(uN , ûN ; θ)− F(ûN)+ A(p̂N , pN ; θ) +∇uηy|uN (p̂N),
and solve the variational problem: find p̂N ∈ WN
A(p̂N ,wN ; θ) = F(wN)− A(uN ,wN ; θ), ∀wN ∈ WN ,
and the variational problem: find ûN ∈ VN
A(vN , ûN ; θ) = −A(vN , pN ; θ)− ∂uηy|uN (vN)−∇2uηy|uN (p̂N , vN), ∀vN ∈ VN ,
which leads to the gradient
∇θη∆y,N(θ) = ∂θL(uN , pN , ûN , p̂N ; θ).
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 43 / 65
Error estimates for the state and adjoint
Assumption: Well-posednessThe bilinear form A(·, ·; θ) : V × V → R and linear form F(·) : V → RsatisfyA1 At any θ ∈ Θ, there exist a coercivity constant α(θ) > 0 and a
continuity constant γ(θ) > 0 such that
α(θ)||w||2V ≤ A(w,w; θ) and A(w, v; θ) ≤ γ(θ)||w||V ||v||V , ∀w, v ∈ V.
The linear functional F(·) : V → R is bounded with norm
||F(·)||V′
Error estimates for the state u and adjoint p
Let eur (θ) and epr (θ) denote the RB state and adjoint errors
eur (θ) := uh(θ)− uN(θ), epr (θ) := ph(θ)− pN(θ).Let Ru(uN , ·; θ) denotes the residual of the state equation
Ru(uN , vh; θ) = A(uN , vh; θ)− F(vh; θ) ∀vh ∈ Vh,and Rp(pN , ·; θ) denotes the residual of the adjoint equation
Rp(wh, pN ; θ) = A(wh, pN ; θ) +∇uηy|uN (wh) ∀wh ∈ Vh.
Lemma: Error estimates for the state u and adjoint pUnder the well-posedness assumption, for any θ ∈ Θ, there holds
||eur (θ)||V ≤1
α(θ)||Ru(uN , ·; θ)||V′ ,
and||epr (θ)||V ≤
1α(θ)
||Rp(·, pN ; θ)||V′ +COα(θ)
||eur (θ)||V .
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 45 / 65
Error estimates for the potential ηy and gradient ∇θηy
Lemma: Error esstimates for ηy,N(θ) and η∆y,N(θ).
There exists constant C(θ) > 0 for each θ ∈ Θ, independent of N, s.t.
|eηr (θ)| := |ηy(θ)− ηy,N(θ)| ≤ C(θ)||eur (θ)||V .
There exists constant C1(θ) > 0 for each θ ∈ Θ, independent of N, s.t.
|e∆r (θ)| := |ηy(θ)− η∆y,N(θ)| ≤ C||eur (θ)||V(||eur (θ)||V + ||epr (θ)||V).
Lemma: Error esstimates for ∇θηy,N(θ) and ∇θη∆y,N(θ)There exist C1(θ),C2(θ) > 0 for each θ ∈ Θ, independent of N, s.t.
||∇θeηr (θ)||1 ≤ C1(θ)||∇θeur (θ)||Vd + C2(θ)||∇θuN(θ)||Vd ||eur (θ)||V .
There exist C1(θ), ,C2(θ),C3(θ),C4(θ) > 0, independent of N, such that
||∇θe∆r (θ)||1 ≤ C1||∇θeur (θ)||Vd ||epr (θ)||V + C2||∇θepr (θ)||Vd ||eur (θ)||V+ C3||eur (θ)||V ||epr (θ)||V + C4||∇θeur (θ)||Vd ||eur (θ)||V .
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 46 / 65
Error estimates for the posterior πy
Theorem: Error estimates for the posterior πy
DKL(πhy |πry) ≤ Eπhy [|eηr |] + Eπhy [| exp(e
ηr )− 1|],
andDKL(πhy |π∆y ) ≤ Eπhy
[|e∆r |
]+ Eπhy [| exp(e
∆r )− 1|].
Corollary: Error estimates for the posterior πyLet Θ1 =: {θ ∈ Θ : eηr (θ) < 1}, if
Eπhy (Θ\Θ1)[| exp(eηr )− 1|] < δEπhy [|e
ηr |]
for some constant δ > 0, we have
DKL(πhy |πry) ≤ (3 + δ)Eπhy [|eηr |] .
The same holds for DKL(πhy |π∆y ) ≤ (3 + δ)Eπhy[|e∆r |
].
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 47 / 65
Algorithm 3 Adaptive greedy algorithm with Stein samples
1: Input: samples θ0m ∼ π0, m = 1, . . . ,M, tolerance ε0r , update step k.2: Output: Stein samples θm, m = 1, . . . ,M.3: Initialization: at θ = θ01, solve the high-fidelity state and adjoint prob-
lems for uh and ph, set Vr = span{uh} and Wr = span{ph}, computethe reduced matrices and vectors for once.
4: while at step l = 0, k, 2k, . . . , of the SVGD algorithm do5: Compute the error indicator 4N(θlm) for m = 1, . . . ,M.6: while maxm=1,...,M |4N(θlm)| > εlr do7: Choose θ = argmaxθlm,m=1,...,M |4N(x
lm)|.
8: Solve the high-fidelity problems for uh and ph at θ.9: Enrich the spaces Vr = Vr
⊕span{uh}, Wr = Wr
⊕span{ph}.
10: Compute all the reduced matrices and vectors for once.11: Compute the error indicator 4N(θlm) for m = 1, . . . ,M.12: end while13: Perform SVGD update with RB approximations.14: Update the tolerance εlr according to gradient in SVGD algorithm.15: end while
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 48 / 65
Numerical example
We consider the diffusion problem
−∇ · (a(θ, x)∇u) = f (x), x ∈ D = (0, 1)2,where f = 1, the coefficient
a(θ, x) = 5 +∑
1≤i+j≤4
1√i2 + j2
θi,j cos(iπx1) cos(jπx2).
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 49 / 65
Numerical results: Comparison
(a) Samples at step l = 0, 9, 99 (b) marginal posterior
Figure: Comparison of (128) sample distribution driven by SVGD high-fidelityapproximation (blue) and reduced basis approximation (red).
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 50 / 65
Numerical results: Accuracy
(a) adaptive construction ηy (b) adaptive construction ∇θηy
(c) fixed construction ηy (d) fixed construction ∇θηy
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 51 / 65
Numerical results: Adaptive greedy algorithm
Figure: Tolerances for adaptive greedy algorithm (left);# reduced basis functions for difference initial tolerances (right)
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 52 / 65
Numerical results: Cost
FE adaptive RB fixed RBinitial tolerance ε0r n/a 1 0.1 0.01 0.00001
M = 64
DOF (Nh,Nr) 16641 20 31 49 62time to build RB n/a 4.4 7.1 12.2 15.8
time for evaluation 1.8× 103 4.4 4.8 5.8 7.3speedup factor 1 203 148 98 62
M=128
DOF (Nh,Nr) 16641 19 30 53 87time to build RB n/a 4.5 7.3 14.3 26.3
time for evaluation 3.5× 103 8.3 9.5 11.8 19.2speedup factor 1 267 212 137 78
Table: Comparison of high fidelity and reduced basis approximations ondegrees of freedom (DOF), CPU time for different tolerances and # samples
P. Chen, O. Ghattas. Stein variational reduced basis Bayesian inversion, 2019.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 53 / 65
Summary
Take away message:
Reduced basis methods reduce the computational cost whilepreserving physical structure with certified accuracy.Leverage goal-oriented adaptive construction of RB.
Ongoing:
RB for SVN.Parameter and state reduction by projected SV + RB.Extension to nonlinear and nonaffine problems.
Thank you for your attention!
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 54 / 65
References I
Beskos, A., Girolami, M., Lan, S., Farrell, P. E., and Stuart, A. M.(2017).Geometric mcmc for infinite-dimensional inverse problems.Journal of Computational Physics, 335:327 – 351.
Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., andWojtaszczyk, P. (2011).Convergence rates for greedy algorithms in reduced basismethods.SIAM Journal on Mathematical Analysis, 43(3):1457–1472.
Bui-Thanh, T., Ghattas, O., Martin, J., and Stadler, G. (2013).A computational framework for infinite-dimensional bayesianinverse problems part I: The linearized case, with application toglobal seismic inversion.SIAM Journal on Scientific Computing, 35(6):A2494–A2523.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 55 / 65
References II
Bui-Thanh, T. and Girolami, M. (2014).Solving large-scale pde-constrained bayesian inverse problemswith riemann manifold hamiltonian monte carlo.Inverse Problems, 30(11):114014.
Chen, P. and Ghattas, O. (2019).Stein variational reduced basis Bayesian inversion.in preparation.
Chen, P. and Schwab, C. (2016).Sparse-grid, reduced-basis Bayesian inversion:Nonaffine-parametric nonlinear equations.Journal of Computational Physics, 316:470 – 503.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 56 / 65
References III
Chen, P., Villa, U., and Ghattas, O. (2017).Hessian-based adaptive sparse quadrature for infinite-dimensionalBayesian inverse problems.Computer Methods in Applied Mechanics and Engineering,327:147–172.
Chen, P., Wu, K., Chen, J., O’Leary-Roseberry, T., and Ghattas, O.(2019).Projected stein variational Newton: A fast and scalable Bayesianinference method in high dimensions.arXiv preprint arXiv:1901.08659.
Cui, T., Marzouk, Y., and Willcox, K. (2015).Data-driven model reduction for the bayesian solution of inverseproblems.International Journal for Numerical Methods in Engineering,102(5):966–990.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 57 / 65
References IV
Detommaso, G., Cui, T., Marzouk, Y., Spantini, A., and Scheichl,R. (2018).A stein variational Newton method.In Advances in Neural Information Processing Systems, pages9187–9197.
DeVore, R., Petrova, G., and Wojtaszczyk, P. (2012).Greedy algorithms for reduced bases in Banach spaces.Arxiv preprint arXiv:1204.2290.
El Moselhy, T. and Marzouk, Y. (2012).Bayesian inference with optimal maps.Journal of Computational Physics.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 58 / 65
References V
Farcas, I.-G., Latz, J., Ullmann, E., Neckel, T., and Bunagrtz, H.-J.(2019).Multilevel adaptive sparse Leja approximations for Bayesianinverse problems.arXiv preprint arXiv:1904.12204.
Gantner, R. N. and Schwab, C. (2016).Computational higher order quasi-Monte Carlo integration.In Monte Carlo and Quasi-Monte Carlo Methods, pages 271–288.Springer.
Girolami, M. and Calderhead, B. (2011).Riemann manifold langevin and hamiltonian monte carlo methods.Journal of the Royal Statistical Society: Series B (StatisticalMethodology), 73(2):123–214.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 59 / 65
References VI
Lan, S., Bui-Thanh, T., Christie, M., and Girolami, M. (2016).Emulation of higher-order tensors in manifold monte carlo methodsfor bayesian inverse problems.Journal of Computational Physics, 308:81–101.
Lassila, T., Manzoni, A., Quarteroni, A., and Rozza, G. (2013).A reduced computational and geometrical framework for inverseproblems in hemodynamics.International journal for numerical methods in biomedicalengineering, 29(7):741–776.
Lieberman, C., Willcox, K., and Ghattas, O. (2010).Parameter and state model reduction for large-scale statisticalinverse problems.SIAM Journal on Scientific Computing, 32(5):2523–2542.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 60 / 65
References VII
Liu, Q. and Wang, D. (2016).Stein variational gradient descent: A general purpose Bayesianinference algorithm.In Advances In Neural Information Processing Systems, pages2378–2386.
Martin, J., Wilcox, L., Burstedde, C., and Ghattas, O. (2012).A stochastic Newton MCMC method for large-scale statisticalinverse problems with application to seismic inversion.SIAM Journal on Scientific Computing, 34(3):A1460–A1487.
Marzouk, Y., Najm, H., and Rahn, L. (2007).Stochastic spectral methods for efficient bayesian solution ofinverse problems.Journal of Computational Physics, 224(2):560–586.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 61 / 65
References VIII
Marzouk, Y. and Xiu, D. (2009).A stochastic collocation approach to Bayesian inference in inverseproblems.Communications in Computational Physics, 6(4):826–847.
Nguyen, C., Rozza, G., Huynh, D., and Patera, A. (2010).Reduced basis approximation and a posteriori error estimation forparametrized parabolic pdes; application to real-time bayesianparameter estimation.Technical report, John Wiley & Sons.
Oliver, D. S. (2017).Metropolized randomized maximum likelihood for improvedsampling from multimodal distributions.SIAM/ASA Journal on Uncertainty Quantification, 5(1):259–277.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 62 / 65
References IX
Rezende, D. J. and Mohamed, S. (2015).Variational inference with normalizing flows.arXiv preprint arXiv:1505.05770.
Schillings, C. and Schwab, C. (2013).Sparse, adaptive smolyak quadratures for bayesian inverseproblems.Inverse Problems, 29(6).
Schillings, C., Sprungk, B., and Wacker, P. (2019).On the convergence of the Laplace approximation andnoise-level-robustness of Laplace-based Monte Carlo methods forBayesian inverse problems.arXiv preprint arXiv:1901.03958.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 63 / 65
References X
Schwab, C. and Stuart, A. (2012).Sparse deterministic approximation of bayesian inverse problems.Inverse Problems, 28(4):045003.
Spantini, A., Bigoni, D., and Marzouk, Y. (2018).Inference via low-dimensional couplings.The Journal of Machine Learning Research, 19(1):2639–2709.
Stuart, A., Voss, J., and Wilberg, P. (2004).Conditional path sampling of sdes and the langevin mcmc method.Communications in Mathematical Sciences, 2(4):685–697.
Wang, J. and Zabaras, N. (2005).Using Bayesian statistics in the estimation of heat source inradiation.International journal of heat and mass transfer, 48(1):15–29.
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 64 / 65
References XI
Wang, K., Bui-Thanh, T., and Ghattas, O. (2018).A randomized maximum a posteriori method for posterior samplingof high dimensional nonlinear Bayesian inverse problems.SIAM Journal on Scientific Computing, 40(1):A142–A171.
Wang, Z., Cui, T., Bardsley, J., and Marzouk, Y. (2019).Scalable optimization-based sampling on function space.arXiv preprint arXiv:1903.00870.
Thank you for your attention!
Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 65 / 65
Bayesian inversionStein variational methodsProjected Stein variational methodsStein variational reduced basis methods
fd@rm@0: