37
Model based Bayesian spatio-temporal survey design for species distribution modelling Jia Liu joint work with Jarno Vanhatalo, University of Helsinki University of Helsinki Bayesian Statistics in the Big Data Era in CIRM, Marseille 29 November 2018 1 / 37

Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Model based Bayesian spatio-temporal surveydesign for species distribution modelling

Jia Liu

joint work with Jarno Vanhatalo, University of Helsinki

University of Helsinki

Bayesian Statistics in the Big Data Erain CIRM, Marseille

29 November 2018

1 / 37

Page 2: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

2 / 37

Page 3: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

3 / 37

Page 4: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Motivation

A central question of geostatistics is the prediction of spatial patternsover a ROI using data measured at finite set of locations. —A hierarchicalGaussian process model

When the data are not fully observed, with a suitable model, thegoodness of the spatial prediction and estimation depend on the spatialallocation of the measurement locations [Müller, 2007], i.e.observational/experimental design.

The design in spatial data analysis– the spatial/spatiotemporal allocationof the data.

Gaussian v.s. non-Gaussian observation processes in spatial analysis.

We study observational designs for spatiotemporal log-Gaussian Coxprocesses (LGCPs).

4 / 37

Page 5: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

LGCPs

Why LGCPs?

A LGCP arises from an inhomogeneous Poisson process with intensity λwhose logarithm has a Gaussian process.

In terms of the spatialtemporal observation design, the key question iswhen and where we should do the survey in order to learn most of theessentials of λ(sss, t).

The interest is the saptiotemporal varies over the intensity surface.

5 / 37

Page 6: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

6 / 37

Page 7: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Denote the study region by D, and a vector of spatiotemporal covariatesby xxx = [sssT , t ] ∈ D.

The (approximate) likelihood can be written

L (y1, . . . , yn|λ(·)) = L(y1, . . . , yn|λ(xxx i ))

=n∏

i=1

Poisson (yi |λ(xxx i )) (1)

where n is the number of observed discretized locations and yi is thecount observation at i ’th location xxx i andlog(λ(xxx)) = fff = [f (xxx1), . . . , f (xxxn)]T is a vector of latent variables at thoselocations, and has a multivariate Gaussian distribution.

7 / 37

Page 8: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

The additive model for spatiotemporal Gaussianprocess prior

Additive model

logλ(xxx) = f (sss, t) ∼ GP(µ(sss, t), k(sss,sss′) + k(t , t ′)). (2)

f (sss, t) = µ(sss, t) + g(sss) + h(t),

where the additive terms are mutually independent Gaussian processes.g(sss) ∼ GP(0, k(sss,sss′)) and h(t) ∼ GP(0, k(t , t ′)).

Choices of covariance functions (e.g., Marten, square exponential, etc.).

Laplace approximation for posterior inference.

GPstuff [Vanhatalo et al., 2013] software v.s. other alternatives.

8 / 37

Page 9: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

9 / 37

Page 10: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

What is the design?What is the problem that arises from the design?How to evaluate the design?

10 / 37

Page 11: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Expected utilities

We will denote by Dn = {dn} the set of all possible designs of size n in do-main D.

The expected utility is then defined as

U(dn) =

∫Y

∫fU(dn, f , y)p(f |dn, y∗)p(y |dn)df dy∗, (3)

where y∗ ∈ Y the future data.

The MC simulation.

11 / 37

Page 12: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Spatial balance designs

The model-based optimal experimental design, simulated annealingalgorithms [Müller, 1999, Müller et al., 2004], interactive MCMC methods[Amzal et al., 2006].Spatial balance design sampling methods to increase expected utilitiesand obtain good designs by means of good coverage rates of the surveyregion.

Halton, Sobol designs, the Fibonacci lattice designs, distance based de-signs ( simple inhibitory, inhibitory plus close pairs latticedesigns [Chipeta et al., 2016], and the space-filling design[Nychka and Saltzman, 1998].

12 / 37

Page 13: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

13 / 37

Page 14: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Inclusion probability

An even probability.Some locations are more informative than others.Rejection sampling scheme, more weights to certain covariates whichare a priori expected to be more informative.

14 / 37

Page 15: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Rejection design

The general algorithm of the rejection sampling design proceeds as following:

1 Randomly generate a location xxx∗ within the study domain (here any ofthe above random or quasi-random sequence can be used);

2 Calculate an inclusion probability 0 ≤ p(xxx∗) ≤ 13 Accept the location with probability p(xxx∗). If accepted, set xxx j = xxx∗ and

increase j = j + 1. If rejected, keep j = j and return to step 1;

4 Repeat steps 1-3 until the size of design reaches to n.

15 / 37

Page 16: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

16 / 37

Page 17: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Two common utility functions in geostatistics

We will consider two commonly used utilities in geostatistics.(1) The average predictive variance (APV)

⇒ LAPV(dn) =1|D|

∑y∈N n

p(y |dn)

∫xxx∗∈D

Var{λ(xxx∗)|dn, y}d xxx∗ . (4)

The MC approximation of (APV)

LAPV(dn) ≈ 1M

M∑j=1

[1N

∑xxx∗∈X∗

Var{λj (xxx∗)|dn,Yj}d xxx∗

],

The intensity function

µ(λ(xxx∗)) = exp(µ(f (xxx∗)) + Var(f (xxx∗))/2

),

Var[λ(xxx∗)] =

[exp(Var(f (xxx∗))− 1

]exp(

2µ(f (xxx∗) + Var(f (xxx∗)).

17 / 37

Page 18: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

KL divergence

The mutual information

UKL(dn,Y ) = KL(

d P(f (·)|X ,Y )||d P(f (·))

)⇒ UKL(dn) =

∑y∈N n

p(y |dn)KL(

d P(f (·)|X , y)||d P(f (·))

). (5)

The Kullback-Leibler divergence (KL) [Kullback, 1987]

UKL(dn, y) =12

(log |K∗K−1

∗|y |+ tr(K−1∗ K∗|y )

+ (µ∗ − µ∗|y )T K−1∗ (µ∗ − µ∗|y ))− c

), (6)

where c is the dimension of the covariance matrices K∗|y and K∗.

18 / 37

Page 19: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

The KL divergence

The KL-divergence from the prior to the posterior

KL(

d P(f (·)|y)||d P(f (·))

)=

∫log

p(y |f (·))d P(f (·))

d P(f (·))∫

p(y |f (·))d P(f (·))d P(f (·)|y)

=

∫log p(y |f (·))d P(f (·)|y)− log p(y)

=

∫log p(y | fff ) d P(fff |y)− log p(y), (7)

where p(y) =∫

p(y |f (·))d P(f (·)) =∫

p(y | fff )p(fff ) d fff . When X ⊂ D, weget the last equality.

19 / 37

Page 20: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

20 / 37

Page 21: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Examples of spatiotemporal designs

A random draw from an additive GP with unimodal mean function along time(color surface) and samples from Sobol design (n = 30).

21 / 37

Page 22: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Poisson additive model for latent function, EAPV

The dimension of the designs, n = 100.

22 / 37

Page 23: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Poisson additive model, EKL

23 / 37

Page 24: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

24 / 37

Page 25: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Case study

We design a survey to inform spatial distribution of fish larval areas onFinnish coastal region in the northern Baltic Sea. The data contain severaldifferent species, count data between year 2007-2014, and from early Mayand early July (the calender days 128 -188. Ten different covariates thatinclude times and spatial regions.

25 / 37

Page 26: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Map of the case study area on the Finnish coastal region. The study regionincludes 229 429 very dense spatial grid cells, 20 weeks, in total 4 588 580spatiotemporal grid cells.

26 / 37

Page 27: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Results

Posterior inference with monotonic constraints.

27 / 37

Page 28: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Shrink the survey region in the study.

28 / 37

Page 29: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

29 / 37

Page 30: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

The crosses connected with solid lines show the Monte Carlo estimate andthe highlighted regions show the 95% credible interval of this estimate.

30 / 37

Page 31: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

a) pike perch sampling design

b) herring sampling design

31 / 37

Page 32: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Outline

1 Introduction

2 The Bayesian modeling and LGCP

3 The design

4 Rejection design

5 Utility functions and computational challenge

6 Simulation study

7 Case study

8 Conclusion

32 / 37

Page 33: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Conclusion

1 Realistic prior information can increase the expected utility of the designswith the observations that have LGCPs.

2 The design with inclusion probability keeps randomness and inherits theadvantages from the spatial balance designs.

3 We need good/optimal designs: reduce the cost, good inference, etc.

4 This work has an arXiv version (arXiv:1808.09200).

33 / 37

Page 34: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

On going works

1 New computational algorithms to make the computation of utilities andrelatives (covariance matrix and inversion, the Cholesky decomposition)to be feasible and efficient with Big data.

2 Bayesian optimal design, new stochastic methods based annealingsimulations to work with high dimensional cases. Study the discretizedand continuous design spaces. Good proposals for fast mixing rates ofthe Markov chains.

34 / 37

Page 35: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

References

Amzal, B., Bois, F. Y., Parent, E., and Robert, C. P. (2006).Bayesian-optimal design via interacting particle systems.Journal of the American Statistical association, 101(474):773–785.

Chipeta, M., Terlouw, D., Phiri, K., and Diggle, P. (2016).Inhibitory geostatistical designs for spatial prediction taking account ofuncertain covariance structure.Environmetrics.

Kullback, S. (1987).Letter to the editor: The Kullback-Leibler distance.American Statistician, 41(4):340–341.

Müller, P. (1999).Simulation based optimal design.Bayesian statistics, 25:459–474.

Müller, P., Sansó, B., and De Iorio, M. (2004).Optimal bayesian design by inhomogeneous markov chain simulation.Journal of the American Statistical Association, 99(467):788–798.

35 / 37

Page 36: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

References (continued)

Müller, W. G. (2007).Collecting spatial data.

Nychka, D. and Saltzman, N. (1998).Design of air-quality monitoring networks.In Case studies in environmental statistics, pages 51–76. Springer.

Vanhatalo, J., Riihimäki, J., Hartikainen, J., Jylänki, P., Tolvanen, V., andVehtari, A. (2013).GPstuff: Bayesian modeling with Gaussian processes.Journal of Machine Learning Research, 14(Apr):1175–1179.

36 / 37

Page 37: Model based Bayesian spatio-temporal survey design for ... › ProgWeebly › Renc1912 › Liu.pdf · Model based Bayesian spatio-temporal survey design for species distribution modelling

Thank you very much!Merci beaucoup!

37 / 37