42
Brief Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics University of Missouri-Columbia Outline Bayesian Statistics Hierarchical Bayesian Models Examples Blending Tropical Surface Winds Long-Lead Prediction of SST Tornado Report Climatology Conclusion AMS Short Course, January, 2004

Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

  • Upload
    ngodan

  • View
    242

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Brief Introduction to (Hierarchical) Bayesian Methods

Christopher K. WikleDepartment of Statistics

University of Missouri-Columbia

Outline

• Bayesian Statistics

• Hierarchical Bayesian Models

• Examples

– Blending Tropical Surface Winds

– Long-Lead Prediction of SST

– Tornado Report Climatology

• Conclusion

AMS Short Course, January, 2004

1

Page 2: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Geophysical Analysis

Two types of geophysical models?

• Physical:

– Laws of classical physics

– Uncertainties: neglecting higher order terms; choice of space-time averaging (subgrid-scale parameterizations); linking sys-tems (coupling biological processes to weather), etc.

• Statistical:

– Descriptive

– Uncertainties: inefficient in complex settings; data not repre-sentative of full dynamics; application under system changes;sampling; estimation

2

Page 3: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

“Physical/Statistical Models”

Better to consider a “spectrum of models” (e.g., Berliner, 2003)

Deterministic ⇐⇒ Stochastic

Bayesian Approach:

• Natural for combining information sources while managing theiruncertainties.

– Multiple data sources

– Uncertain physics

– Physical models with stochastic parameters

– Expert opinion

• Predictive distributions of quantities of interest, conditioned on

observations

3

Page 4: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Distributional Notation

We use the short-hand notation [ ] for probability distribution.

Given random variables A and B, represent:

• Joint distribution of A and B: [A, B]

• Conditional distribution of A given B: [A|B]

• Marginal distribution of A: [A]

Also, note the following are equivalent:

Y |µ, σ2 ∼ N(µ, σ2)

Y = µ + ε, ε ∼ N(0, σ2)

where “∼” means “is distributed as”.

4

Page 5: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Bayesian Viewpoint

Bayes’ Theorem:

[Y |X ] =[Y,X ]

[X ]=

[X|Y ][Y ]∫[X|Y ][Y ]dY

For random variables with known distributions, this is a “fact” fromprobability theory. It is not controversial!

Bayesian Statistics:

Modeling unknown distributions and updating those models based ondata. Might be controversial!

5

Page 6: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Bayesian Statistics

[Y |X ] =[X|Y ][Y ]∫

[X|Y ][Y ]dY∝ [X|Y ][Y ]

• Want to make inference about Y , but we only observe X

• We update our uncertainty about Y after observing X

• [Y ] reflects our prior knowledge of Y

• [X|Y ] is the “likelihood”

• [Y |X ] is the posterior distribution

Bayesian Modeling:

Treat all unknowns as if they are random and evaluateprobabilistically

[Process|Data] ∝ [Data|Process][Process]

6

Page 7: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Simple Example 1

• Temperature observations: T (s1), T (s2), . . . , T (sn)

• These are observations of the true (unknown) process: θ

• We have prior information about θ, say θ0 from a “numerical model”but we don’t believe our model is perfect.

Probabilistically, we may believe:

[Data|Process] : T (si)|θ, σ2 ∼ N(θ, σ2)

[Process] : θ|θ0, τ2 ∼ N(θ0, τ

2)

7

Page 8: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

We want the posterior: [Process|Data], i.e.,

[θ|T (s1), . . . , T (sn)] ∝ [T (s1), . . . , T (sn)|θ, σ2][θ|θ0, τ2]

where for simplicity, we assume θ0, τ2, σ2 are known. Then,

θ|T (s1), . . . , T (sn) ∼ N(wT + (1− w)θ0,σ2τ 2

σ2 + nτ 2)

where

T =1

n

n∑i=1

T (si)

w =τ 2

τ 2 + σ2/n.

Note: this is a weighted average of the sample mean and the priormean, where the weights are related to our certainty about the obser-vations and prior.

8

Page 9: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Basic Bayes is Not New to Meteorology

• Epstein, early 1960’s used Bayesian decision making in applied me-teorology

• Weather Modification Research in 1970’s (e.g., Simpson et al. 1973,JAS)

• Data Assimilation can be viewed from Bayesian viewpoint (e.g.,Lorenc 1986, Q. J. Roy. Met. Soc.)

– Kalman filter is Bayesian (e.g., Meinhold and Singpurwalla, 1983)

– Ensemble Kalman Filters (Sequential Monte Carlo); e.g., Evensenand van Leeuwen, 2000, MWR

• Climate Change (e.g., Solow 1988; J. Clim.; Leroy 1998, J. Clim.)

• Satellite Retrieval (many!)

Hierarchical Bayes is new to meteorology!

9

Page 10: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Hierarchical Bayesian Viewpoint

Cornerstone of hierarchical modeling is conditional thinking.

Joint distribution can be represented as products of conditionals. e.g.,

[A, B, C] = [A|B, C][B|C][C]

NOTE: Our choice for this decomposition is based on what we knowabout the process, and what assumptions we are willing and able tomake for simplification:

e.g., conditional independence: [A|B,C]=[A|B].

Often easier to express conditional models than full joint models.

10

Page 11: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Hierarchical Bayesian Modeling of Geophysical Systems

Separate unknowns into two groups:

• process variables: actual physical quantities of interest (e.g., tem-perature, pressure, wind)

• model parameters: quantities introduced in model development(e.g., propagator matrices, measurement error and sub-grid scalevariances, unknown physical constants)

Basic Hierarchical Model

1. [data|process, parameters]

2. [process|parameters]

3. [parameters]

The posterior [process,parameters|data] is proportional to theproduct of these three distributions!

11

Page 12: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Simple Example 1 (cont.)

Say that in Example 1, we believe that the numerical model valueof θ is biased. We do not know exactly the bias, but suspect it isrelated to known factors x ≡ (x1, x2, . . . , xk)

′ (e.g., MOS). Then, ourhierarchical model has the following stages:

• Data Model:

T (si)|θ, σ2 ∼ N(θ, σ2)

• Process Model:

θ|θ0,x, β, τ 2 ∼ N(θ0 + x′β, τ 2)

• Parameter Model:

β|β0,Σ ∼ N(β0,Σ)

We then seek the posterior distribution: [θ, β|T (s1), . . . , T (sn)]

12

Page 13: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Complicated Processes

Usually, we can’t find analytically the normalizing constant in Bayes’theorem. Thus we can’t easily get the posterior distribution.

• Numerical integration (o.k. for low dimensions)

• Use Monte Carlo methods to sample from the posterior. Specifi-cally,

– Markov Chain Monte Carlo (MCMC)

∗ Sample from a Markov Chain that has the same ergodic dis-tribution as the posterior

∗ Don’t need to know normalizing constant

∗ e.g., Metropolis, Gibbs sampler algorithms

∗ revolutionized Bayesian statistics in the late 1980’s

– Very computationally intensive

13

Page 14: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Example 2: Tropical Ocean Winds

Problem: Blending tropical surface winds given high-resolution satel-lite scatterometer observations and low-resolution assimilated modeloutput. (Wikle, Milliff, Nychka, Berliner, 2001; JASA)

14

Page 15: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Wind Problem: Process Model Motivation I

Consider the linear shallow water equations (on an equatorial beta-plane):

∂u

∂t− β0yv + g

∂h

∂x= 0

∂v

∂t+ β0yu + g

∂h

∂y= 0

∂h

∂t+ h(

∂u

∂x+

∂v

∂y) = 0

• β0: constant related to rotation of earth

• g: gravitational acceleration

This linear system can be solved analytically, giving a series of travelingwaves (equatorial normal modes) - Observed in the tropics!

15

Page 16: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Wind Problem: Process/Parameter Model Motivation II

Empirical results suggest turbulent scaling behavior for near surfacetropical winds (e.g., Wikle, Milliff and Large, 1999; JAS)

In particular:

Sv(ω) ∝ σ2v

|ω|κ

where κ = 5/3, and ω is the spatial frequency.

16

Page 17: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Wind Model: Hierarchical Model Sketch (v-Component)

• Data Model: Observations at different resolutions

[Zs′t Za

′t]′ = Ktvt + εt, εt ∼ N(0,Σε)

• Process Model:vt = Φat + γt,

Φ - matrix containing “important” normal mode basis functions(importance determined from empirical studies, e.g., Wheeler andKiladis, 1999)

at - time-varying spectral coefficients

γt - multiresolution spatio-temporal process

• Parameter Models:

at = H(θ)at−1 + ηt, ηt ∼ N(0,Ση)

θ - priors based on equatorial wave theory and empirical studies

γt - priors to suggest turbulent power-law relationships

17

Page 18: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Wind Problem Results: Posterior Decomposition

18

Page 19: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Wind Problem Results: Tropical Cyclone Dale

19

Page 20: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Wind Blending: “Operational”

Tim Hoar, Geophysical Statistics Project, NCAR has made thismodel “operational” on QSCAT data (1/2 deg)

20

Page 21: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Example 3: Long Lead Prediction of SST

Goal: Predict Pacific SST anomalies (2◦ × 2◦ resolution) 7 months inadvance while realistically accounting for uncertainty. [Berliner,Wikle,Cressie, 2000, J. Climate]

Consider data {Z(si; t)} to be anomalies from monthly means.

Want to predict Z(s0; T + τ ); for τ = 7 months, from space-time dataD(T ) ≡ {ZT ,ZT−1, . . . ,Z1}.

21

Page 22: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Data Model

Dimension Reduction:

Zt = Φat + νt,

where

• νt ∼ Gau(0,Σν); measurement error and small-scale spatial vari-ability that is uncorrelated in time

• Φ; truncated empirical orthogonal function basis set, ob-tained from SVD of Z ≡ (Z1, . . . ,ZT ) .

22

Page 23: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Process Model

at+τ = µt + Htat + ηt+τ ,

where, ηt ∼ Gau(0,Ση) for all t.

Critical modeling assumption: Let Ht and µt be dependent on boththe current and future climate regimes:

Ht = H(It, Jt)

µt = µ(It, Jt),

where,

• It - classifies the current regime as “warm” (2), “normal” (1), or“cold” (0)

• Jt - anticipates a transition to one of the three regimes at time t+τ

23

Page 24: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Climate States

Current, It: [Threshold Model, e.g., Tong 1990]

It = 0, if SOIt > low threshold

= 1, if otherwise

= 2, if SOIt < upper threshold,

where SOIt is the Southern Oscillation Index (assumed “known”).

Future, Jt: [Latent (hidden) Process Model]

Jt = 0, if Wt > low threshold

= 1, if otherwise

= 2, if Wt < upper threshold,

where Wt is a latent process which anticipates the future climateregime.

24

Page 25: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Latent Process to Assess “Future”

Wt|βw, σ2w ∼ Gau(x′tβw, σ2

w)

where,

xt = (1, Ut, Utsin(2πmt

12), Utcos(

2πmt

12), U 2

t )′,

where

• Ut - the lowpass filtered E-W component of the wind at 10 metersabove the surface at 5◦ N and 157◦ E. (This is absolutely critical!)

• mt - index of month (0 - 11) at time t

25

Page 26: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Priors on Model Parameters

At next stage of hierarchy:

βw ∼ Gau(βSOI, cβvar(βSOI)),

σ2w ∼ IG(q, r)

where the mean and variance and IG (“Inverse Gamma”) parametersare obtained from fitting

SOIt+τ = x′tβSOI + et

(gives R2 ≈ .7 !)

• vec(Hj), j = 1, 2, 3 - Multivariate Normal distributions

• µj, j = 1, 2, 3 - Multivariate Normal distributions

• Covariance matrices - Wishart distributions

(“Empirical Bayes” priors)

26

Page 27: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Bayesian Prediction

MCMC Implementation

• Probability weighted mixtures:

E(aT+τ |D(T )) =2∑

j=0Pr{JT = j|D(T )}E(µ(IT , j) + H(IT , j)aT |D(T ))

• Pixelwise percentiles of [ΦaT+τ |D(T )]

• Summary measures of El Nino/La Nina (e.g., Nino 3.4 Index) andtheir posterior distributions

27

Page 28: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Results: Nino 3.4 Prediction of 10/97 from 3/97

28

Page 29: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Results: Posterior Predictions for 10/97 and 10/98

29

Page 30: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Results: Component Predictions for 10/98

30

Page 31: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Results: Out of Sample Nino 3.4 Posterior Predictions

31

Page 32: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Example 4: Tornado Counts (Wikle and Anderson, 2003; JGR-A)

Investigate possible dependence between observed tornado counts andclimate indices (ENSO, NAO, NPI).

NWS Storm Reports 1953-2001

32

Page 33: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Modeling Issues

Complicated Variability

• Non-normal data (counts)

• Observational uncertainty in counts

– Uncertainty in procedure for assigning Fujita scale (reliance onhuman observation and variable rating procedure)

– Demographic and technological influences

• Spatial variability in long-term frequency of tornado counts

• Temporal trends that may vary with space

• Exogenous climatological factors with effects that may vary

spatially (e.g., regionally)

33

Page 34: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Data Model

Y (si; t) - tornado count in grid box si in year t

Poisson Data Model

Y (si; t)|λ(si; t) ∼ Poisson(λ(si; t))

assumes conditional independence (i.e., counts are independent condi-tioned on the true Poisson intensity process, λ)

NOTE: In paper, we consider a “zero-inflated Poisson model”.

34

Page 35: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Process Model

log(λ(si; t) = β1(si)X1(t) + . . . + β6(si)X6(t) + η(si; t) + ε(si; t)

where,

• X1 - time

• X2 - ENSO index

• X3 - NPI index

• X4 - NAO index

• X5 - ENSO by NPI interaction

• X6 - ENSO by NAO interaction

• η(si; t) - spatially correlated temporal process (large-scale) [account-ing for other potential climate effects]

• ε(si; t) - uncorrelated random effect [ε(si; t) ∼ N(0, σ2ε )]

35

Page 36: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Parameter Models

βk ∼ N(0,Σ(θk)) - Spatially correlated random fields specified byvariance and dependence parameters (θk).

ηt ∼ N(µ,Σ(θη)) - spatial process, with mean µ and variance anddependence parameters given by θη.

Distributions must also be specified for the “hyper-parameters”: θk,θη, µ, and σ2

ε (see Wikle and Anderson (2003) for details)

36

Page 37: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Posterior Summary: Time Trend (β1) Coefficients

Posterior Mean and Posterior 95% “Credibility”

37

Page 38: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Posterior Summary: ENSO (β2) Coefficients

Posterior Mean and Posterior 95% “Credibility”

38

Page 39: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Posterior Summary: NPI (β3) Coefficients

Posterior Mean and Posterior 95% “Credibility”

39

Page 40: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Some Other Recent Hierarchical Bayesian Examples

• Air-Sea Interaction: (Berliner, Milliff, Wikle, 2003, JGR-O)

• Climate Change: (Berliner, Levine, Shea, 2000, J. Clim.),

• Stochastic Boundary Value Problems: (Wikle, Berliner, Milliff,2003, MWR)

• Nowcasting Radar: (Xu, Wikle, Fox, 2003)

• Hurricane Climatology: (Elsner and Bossak, 2001, J. Clim.)

• Analysis of Rainfall Data: (Sanso and Guenni, 1999, Applied Statis-tics)

40

Page 41: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

Conclusion

• Bayesian methods allow one to quantify uncertainty in all aspects ofthe problem (data, process, parameters) and distributional outputreflects the quantification of uncertainty

• Hierarchical Bayesian methods allow one to decompose the probleminto a series of simpler conditional models

– Can accommodate data, model, and parameter uncertainty

– Can accommodate data of different resolution/alignment (e.g.,GIS layers)

– Can accommodate complicated spatial, temporal, and spatio-temporal dependence

– Can accommodate physics (e.g., shallow water PDE; turbulence)

– Can accommodate expert opinion

• Downside: Complicated and computationally intense (canned soft-ware available for free, winBUGS)

41

Page 42: Brief Introduction to (Hierarchical) Bayesian Methods ... Introduction to (Hierarchical) Bayesian Methods Christopher K. Wikle Department of Statistics ... Wind Model: Hierarchical

* Most of the papers discussed here can be found on myweb page:

http://www.stat.missouri.edu/~wikle/

42