Hierarchical Dynamical Spatio-Temporal Models: Statistics ... · Hierarchical Dynamical Spatio-Temporal Models: Statistics for Spatio-Temporal Data, Ch. 7 John Paige Statistics Department

Hierarchical Dynamical Spatio-Temporal Models:Statistics for Spatio-Temporal Data, Ch. 7

John Paige

Statistics DepartmentUniversity of Washington

January 29, 2018

1

Hierarchical Dynamical Spatio-Temporal Models (DSTMs):Data Model (Sec. 7.1)

[{Z (x; r) : x ∈ Ds , r ∈ Dt} | {Y (s; t) : s ∈ Nx , t ∈ Nr} ,θD ]

I Z (x, r): observations at location x, time r

I Y (s; t): latent process at location s, time t

I θD : data model parameters, possibly varying in space/time

I Nx , Nr : neighborhoods of x and r in space and time

I D: Data model

2

Process Model

[Y (s; t)

∣∣∣ {Y (w; t − τ1) : w ∈ N (1)s

}, ...,

{Y (w; t − τp) : w ∈ N (p)

s

},θP

]

I N (1)s , ...,N (p)

s : neighborhoods of location s at time lags0, τ1, ..., τp

I θP : process model parameters, possibly varying in space/time

I P: process model

3

Parameter Model

[θD ,θP |θh]

I θD , θP , θh: data, process, and hyperparameters

4

Linear Mappings with Equal Dimensions

Zt = Yt + εt , εt ∼ N(0, σ2

ε I)

(7.8)

Z (s; t) = a + hY (s; t) + ε(s; t) E [ε(s; t)] = 0 (7.9)

Zt = at + diag(ht)Yt + εt εt ∼ N(0, σ2

ε I)

(7.10)

Zt = at + HtYt + εt εt ∼ N(0, σ2

ε I)

(7.11)

Zt = at + HtYt + εt εt ∼ N (0,Rt)

I (7.9): a, h are additive and multiplicative bias terms

I (7.10), (7.11): since at ,ht ,Ht vary in space and time,requires simplifying assumptions about the latent process anddata models

I (7.11): Rt can be modeled using a standard spatial covariancemodel

5

Linear Mappings with Unequal Dimensions: Intro

Zt = HtYt + εt εt ∼ (0,Rt) (7.12)

I Ht : mt × n

I εt are independent

What form can Ht take?

6

Linear Mappings with Unequal Dimensions:Incidence Matrices

Say we have 3 observation locations {x1, x2, x3} and two processlocations, {s1, s2}, with x1 = x2 = s1 and x3 = s2. We could thenwrite Ht as:

Ht =

1 01 00 1

.

This is an incidence matrix.

7

Linear Mappings with Unequal Dimensions:Change of Support

Say we have 3 observation (areal) locations {x1, x2, x3} and twoprocess (areal) locations, {s1, s2}. We could then write Ht as:

Ht =

h11 h12

h21 h22

h31 h32

(7.15)

hij =|xi ∩ sj ||xi |

(7.16)

where |xi | represents the area of region xi .

I Wikle and Berliner (2005) show this is optimal under ‘minor’assumptions

8

Linear Mappings with Unequal Dimensions:Earthquakes

For an earthquake at time t, we might want to model how theground sinks (subsidence), Zt , for an earthquake Yt :

Zt = GtYt + εt

where G is a matrix determining subsidence resulting from anearthquake

40.0

42.5

45.0

47.5

50.0

−127 −123Longitude

Latit

ude

0

5

10

15

●●●● ●●

●●●●●● ●●●●● ●●●●●●●●

●●●●

●

●●● ●●●● ●●● ● ● ●

●●●●●●●●●● ●●●●●

●●●

●●●●●●● ●●●

● ● ●● ●●●●●●

●●●● ●●● ●● ●● ●● ●●●●●

●●● ●●● ●●●●●●●●●●●● ●●●●●

●● ●

●●●●

●●●●●

●●●

●

●●●●

● ●●●●●●●●●

●● ● ●

●●●● ●●● ●●●

●●●●●● ●● ●●

●

●●●

●

●●●●●●

●●●●●●●●●

40.0

42.5

45.0

47.5

50.0

0.0 0.5 1.0 1.5 2.0Subsidence (m)

Latit

ude

40.0

42.5

45.0

47.5

50.0


Latit

ude

2

4

6

0

2

4

6

8

8.8 8.9 9.0 9.1 9.2Magnitudes

Den

sity

40.0

42.5

45.0

47.5

50.0


Latit

ude

0

5

10

15

●●●● ●●

●●●●●● ●●●●● ●●●●●●●●

●●●●

●

●●● ●●●● ●●● ● ● ●

●●●●●●●●●● ●●●●●

●●●

●●●●●●● ●●●

● ● ●● ●●●●●●

●●●● ●●● ●● ●● ●● ●●●●●

●●● ●●● ●●●●●●●●●●●● ●●●●●

●● ●

●●●●

●●●●●

●●●

●

●●●●

● ●●●●●●●●●

●● ● ●

●●●● ●●● ●●●

●●●●●● ●● ●●

●

●●●

●

●●●●●●

●●●●●●●●●

40.0

42.5

45.0

47.5

50.0

0.0 0.5 1.0 1.5 2.0Subsidence (m)

Latit

ude

40.0

42.5

45.0

47.5

50.0


Latit

ude

2

4

6

0

2

4

6

8

8.8 8.9 9.0 9.1 9.2Magnitudes

Den

sity

9

Linear Mappings with Unequal Dimensions:Dimension Reduction

Yt = Φαt + νt (7.24)

Zt = HtΦαt + Htνt + εt︸︷︷︸γt

(7.25)

I Replacing with γt leads to replacing process Yt with reduceddimension process αt .

I Examples choices of basis matrix Φ:I Spectral representationI Empirical Orthogonal Functions (EOFs)I Dynamical system dependent approachesI Smoothing kernels

I Without assumed structure for Φ, model identifiability difficult

10

Dimension Reduction (kinda): Spectral Representation


(7.25)

I Assume Yt ∼ N (0,R)

I Take Φ n × n so that αt ∼ N (0,Rα)

I If Φ is orthogonal, often the case that Φ has decorrelatingeffect:

Rα = Φ′RΦ ≈ diag(d)

I Can use multiresolution wavelet basis functions if data is on alattice (possibly using Ht incidence matrix for missing data)

11

Dimension Reduction: EOFs

Yt = Φαt + νt (7.24)


(7.25)

I Take eigendecomposition of Φ, use the first pα eigenvectors

I Then covariance in νt could be characterized using next pνeigenvectors:

Σν = cI +

pα+pν∑k=pα+1

λkΦkΦ′k

I Problems: missing data, low number of temporal replicates,sensitivity to geometry of spatial domain, basis might poorlyrepresent the dynamics

12

Process Models for the DSTM: Linear Models (Sec. 7.2)

We will consider vector autoregressive processes (VARs) of orderone:

Yt = MYt−1 + ηt ; t = 1, 2, ... E [ηt ] = 0 Var (ηt) = Σν

where ηt independent of Yt−1, E [Yt ] = 0, and Var (Yt) = ΣY .

13

Process Models for the DSTM: Lagged Nearest-NeighborModel

We will consider vector autoregressive processes (VARs) of orderone:

Yt(si ) =∑j∈Ni

mijYt−1(sj) + ηt(si )

I Ni : neighborhood of si

I ηtiid∼ N (0,Qt)

I Dynamics could determine mij , Nij . Larger lags could beconsidered

14

Process Models for the DSTM:PDE-Based Parameterizations

One-dimensional diffusion equation:

∂Y

∂t=

∂

∂x

(b(x)

∂Y

∂x

)∂Y

∂x≈ Y (x + ∆x ; t)− Y (x −∆x ; t)

2∆x

∂2Y

∂x2≈ Y (x + ∆x ; t)− 2Y (x ; t) + Y (x −∆x ; t)

∆2x

∂Y

∂t≈ Y (x ; t + ∆t)− Y (x ; t)

∆t

I x ∈ [0, L]: locationI b(x): diffusion coefficientsI Boundary conditions: Y (0; t) = Y0, Y (L; t) = YL,{Y (x ; 0) : 0 ≤ x ≤ L} (known or have prior distribution)

15


For three locations, x1, x2, x3, this yields:

Y (x ; t + ∆t) ≈ θ1(x)Y (x ; t) + θ2(x)Y (x + ∆x ; t)

+ θ3(x)Y (x −∆x ; t),

⇒

Y (x1; t + ∆t)Y (x2; t + ∆t)Y (x3; t + ∆t)

≈θ1(x1) θ2(x2) 0θ3(x2) θ1(x2) θ2(x2)

0 θ3(x3) θ1(x3)

Y (x1; t)Y (x2; t)Y (x3; t)

+

θ3(x1) 00 00 θ2(x3)

(Y0

YL

)

I θi (x) are known functions of ∆x ,∆t , b(x), b(x −∆x), andb(x + ∆x)

16


Add a stochastic term:

Yt+∆t = M(θ)Yt + M(b)(θ)Y(b)t + ηt+∆t

, ηtiid∼ N (0,Qη)

I θi (x) are known functions of ∆x ,∆t , b(x), b(x −∆x), andb(x + ∆x)

I This is a lagged nearest-neighbor model!

I Estimation of b(x) and Qη is nontrivial, may requiresimplifying assumptions

17

Process Models for the DSTM:Nonlinear Models (Sec. 7.3)

Nonlinear autoregressive model:

Yt =M(Yt−1,ηt ;θt), (7.62)

I M: nonlinear Markovian function control process evolution

18

Nonlinear Models: Local Linear Approximations

Take a Taylor expansion (delta method) of M(·) in (7.62):

Yt =M(Yt−1) + ηt (7.63)

M(Yt) ≈M(Yt−1) + Mt(Yt−1 − Yt−1)), E [Yt ] = Yt (7.64)

(Mt)ij =∂Mi (Y)

∂Yt(sj)

∣∣∣∣Yt=Yt−1

(7.65)

We can therefore write Yt using the form:

Yt = ct + MtYt−1 + ηt (7.66)

I Often times must determine M by estimating θt

19

Nonlinear Models: General Quadratic Nonlinearity

Take a second-order Taylor expansion of M(·) in (7.62):

Yt =M(Yt−1) + ηt (7.63)

M(Yt) ≈M(Yt−1) + Mt(Yt−1 − Yt−1))

+1

2(I⊗ (Yt − Yt)

′)Ht(Yt − Yt) (7.80)

Ht ≡ Ht(Yt) =

H1t(Yt)...

Hnt(Yt)

∣∣∣∣∣Yt=Yt

(7.81)

(Hit(Yt))kl =∂2Mi (Yt)

∂Y (sk)∂Yt(sl)

∣∣∣∣Yt=Yt

(7.82)

20

Multivariate DSTMs (Sec. 7.4)

What if we want to model multiple spatio-temporal processes thatare co-dependent (e.g. temperature, salinity, current speed in theocean)?

Yt = MYt−1 + ηt , ηtiid∼ N (0,Qη) (7.98)

Yt = (Y(1)′

t ... Y(K)′

t )′

I The key is in reducing the dimensionality of this problem

21

Multivariate DSTMs: Reduced Rank Approach

Yt = MYt−1 + ηt , ηtiid∼ N (0,Qη) (7.98)

Yt = (Y(1)′

t ... Y(K)′

t )′

Y(k)t = Φ(k)α

(k)t + ν

(k)t

I Use reduced rank representation of Φ(k)

I Can choose covariance structure of ν(k)t depending on

approximation of Φ(k)

22

Multivariate DSTMs: Modeling Via Common Processes

Now assume:

Yt =

∫Ds

H(s, x ;θ)αt(x) dx + γt(s) (7.103)

Yt ≡ (Y(1)t ... Y

(K)t )′

αt(s) ≡ (α(1)(s), ..., α(J)(s))′

γt(s) ≡ (γ(1)t (s), ..., γ

(K)t (s))′

for kernel matrix H(·, ·; ·)I Hence, we represent the K processes in terms of J processes

with J < K

I If the α(i)t (·) have simple structure (e.g. independent AR(1)

processes with unit variances), and if H has assumed simplestructure, this can be helpful

23

Multivariate DSTMs: Modeling Via Common Processes

Approximate the integral with sums:

Yt =

∫Ds

H(s, x ;θ)αt(x) dx + γt(s) (7.103)

⇒ Y(k)t (s) ≈

pα∑i=1

J∑j=1

h(kj(s, xi ;θ)α(j)t (xi ) (7.104)

⇒ Yt(s) ≈ H(s)αt + γt(s) (7.105)

for s, x1, x2, ..., xpα ∈ Ds .

24

Questions?

25

Documents

Hierarchical Dynamical Spatio-Temporal Models: Statistics ... · Hierarchical Dynamical Spatio-Temporal Models: Statistics for Spatio-Temporal Data, Ch. 7 John Paige Statistics Department