Upload
others
View
21
Download
0
Embed Size (px)
Citation preview
Hierarchical Dynamical Spatio-Temporal Models:Statistics for Spatio-Temporal Data, Ch. 7
John Paige
Statistics DepartmentUniversity of Washington
January 29, 2018
1
Hierarchical Dynamical Spatio-Temporal Models (DSTMs):Data Model (Sec. 7.1)
[{Z (x; r) : x ∈ Ds , r ∈ Dt} | {Y (s; t) : s ∈ Nx , t ∈ Nr} ,θD ]
I Z (x, r): observations at location x, time r
I Y (s; t): latent process at location s, time t
I θD : data model parameters, possibly varying in space/time
I Nx , Nr : neighborhoods of x and r in space and time
I D: Data model
2
Process Model
[Y (s; t)
∣∣∣ {Y (w; t − τ1) : w ∈ N (1)s
}, ...,
{Y (w; t − τp) : w ∈ N (p)
s
},θP
]
I N (1)s , ...,N (p)
s : neighborhoods of location s at time lags0, τ1, ..., τp
I θP : process model parameters, possibly varying in space/time
I P: process model
3
Parameter Model
[θD ,θP |θh]
I θD , θP , θh: data, process, and hyperparameters
4
Linear Mappings with Equal Dimensions
Zt = Yt + εt , εt ∼ N(0, σ2
ε I)
(7.8)
Z (s; t) = a + hY (s; t) + ε(s; t) E [ε(s; t)] = 0 (7.9)
Zt = at + diag(ht)Yt + εt εt ∼ N(0, σ2
ε I)
(7.10)
Zt = at + HtYt + εt εt ∼ N(0, σ2
ε I)
(7.11)
Zt = at + HtYt + εt εt ∼ N (0,Rt)
I (7.9): a, h are additive and multiplicative bias terms
I (7.10), (7.11): since at ,ht ,Ht vary in space and time,requires simplifying assumptions about the latent process anddata models
I (7.11): Rt can be modeled using a standard spatial covariancemodel
5
Linear Mappings with Unequal Dimensions: Intro
Zt = HtYt + εt εt ∼ (0,Rt) (7.12)
I Ht : mt × n
I εt are independent
What form can Ht take?
6
Linear Mappings with Unequal Dimensions:Incidence Matrices
Say we have 3 observation locations {x1, x2, x3} and two processlocations, {s1, s2}, with x1 = x2 = s1 and x3 = s2. We could thenwrite Ht as:
Ht =
1 01 00 1
.
This is an incidence matrix.
7
Linear Mappings with Unequal Dimensions:Change of Support
Say we have 3 observation (areal) locations {x1, x2, x3} and twoprocess (areal) locations, {s1, s2}. We could then write Ht as:
Ht =
h11 h12
h21 h22
h31 h32
(7.15)
hij =|xi ∩ sj ||xi |
(7.16)
where |xi | represents the area of region xi .
I Wikle and Berliner (2005) show this is optimal under ‘minor’assumptions
8
Linear Mappings with Unequal Dimensions:Earthquakes
For an earthquake at time t, we might want to model how theground sinks (subsidence), Zt , for an earthquake Yt :
Zt = GtYt + εt
where G is a matrix determining subsidence resulting from anearthquake
40.0
42.5
45.0
47.5
50.0
−127 −123Longitude
Latit
ude
0
5
10
15
●●●● ●●
●●●●●● ●●●●● ●●●●●●●●
●●●●
●
●●● ●●●● ●●● ● ● ●
●●●●●●●●●● ●●●●●
●●●
●●●●●●● ●●●
● ● ●● ●●●●●●
●●●● ●●● ●● ●● ●● ●●●●●
●●● ●●● ●●●●●●●●●●●● ●●●●●
●● ●
●●●●
●●●●●
●●●
●
●●●●
● ●●●●●●●●●
●● ● ●
●●●● ●●● ●●●
●●●●●● ●● ●●
●
●●●
●
●●●●●●
●●●●●●●●●
40.0
42.5
45.0
47.5
50.0
0.0 0.5 1.0 1.5 2.0Subsidence (m)
Latit
ude
40.0
42.5
45.0
47.5
50.0
−127 −123Longitude
Latit
ude
2
4
6
0
2
4
6
8
8.8 8.9 9.0 9.1 9.2Magnitudes
Den
sity
40.0
42.5
45.0
47.5
50.0
−127 −123Longitude
Latit
ude
0
5
10
15
●●●● ●●
●●●●●● ●●●●● ●●●●●●●●
●●●●
●
●●● ●●●● ●●● ● ● ●
●●●●●●●●●● ●●●●●
●●●
●●●●●●● ●●●
● ● ●● ●●●●●●
●●●● ●●● ●● ●● ●● ●●●●●
●●● ●●● ●●●●●●●●●●●● ●●●●●
●● ●
●●●●
●●●●●
●●●
●
●●●●
● ●●●●●●●●●
●● ● ●
●●●● ●●● ●●●
●●●●●● ●● ●●
●
●●●
●
●●●●●●
●●●●●●●●●
40.0
42.5
45.0
47.5
50.0
0.0 0.5 1.0 1.5 2.0Subsidence (m)
Latit
ude
40.0
42.5
45.0
47.5
50.0
−127 −123Longitude
Latit
ude
2
4
6
0
2
4
6
8
8.8 8.9 9.0 9.1 9.2Magnitudes
Den
sity
9
Linear Mappings with Unequal Dimensions:Dimension Reduction
Yt = Φαt + νt (7.24)
Zt = HtΦαt + Htνt + εt︸ ︷︷ ︸γt
(7.25)
I Replacing with γt leads to replacing process Yt with reduceddimension process αt .
I Examples choices of basis matrix Φ:I Spectral representationI Empirical Orthogonal Functions (EOFs)I Dynamical system dependent approachesI Smoothing kernels
I Without assumed structure for Φ, model identifiability difficult
10
Dimension Reduction (kinda): Spectral Representation
Zt = HtΦαt + Htνt + εt︸ ︷︷ ︸γt
(7.25)
I Assume Yt ∼ N (0,R)
I Take Φ n × n so that αt ∼ N (0,Rα)
I If Φ is orthogonal, often the case that Φ has decorrelatingeffect:
Rα = Φ′RΦ ≈ diag(d)
I Can use multiresolution wavelet basis functions if data is on alattice (possibly using Ht incidence matrix for missing data)
11
Dimension Reduction: EOFs
Yt = Φαt + νt (7.24)
Zt = HtΦαt + Htνt + εt︸ ︷︷ ︸γt
(7.25)
I Take eigendecomposition of Φ, use the first pα eigenvectors
I Then covariance in νt could be characterized using next pνeigenvectors:
Σν = cI +
pα+pν∑k=pα+1
λkΦkΦ′k
I Problems: missing data, low number of temporal replicates,sensitivity to geometry of spatial domain, basis might poorlyrepresent the dynamics
12
Process Models for the DSTM: Linear Models (Sec. 7.2)
We will consider vector autoregressive processes (VARs) of orderone:
Yt = MYt−1 + ηt ; t = 1, 2, ... E [ηt ] = 0 Var (ηt) = Σν
where ηt independent of Yt−1, E [Yt ] = 0, and Var (Yt) = ΣY .
13
Process Models for the DSTM: Lagged Nearest-NeighborModel
We will consider vector autoregressive processes (VARs) of orderone:
Yt(si ) =∑j∈Ni
mijYt−1(sj) + ηt(si )
I Ni : neighborhood of si
I ηtiid∼ N (0,Qt)
I Dynamics could determine mij , Nij . Larger lags could beconsidered
14
Process Models for the DSTM:PDE-Based Parameterizations
One-dimensional diffusion equation:
∂Y
∂t=
∂
∂x
(b(x)
∂Y
∂x
)∂Y
∂x≈ Y (x + ∆x ; t)− Y (x −∆x ; t)
2∆x
∂2Y
∂x2≈ Y (x + ∆x ; t)− 2Y (x ; t) + Y (x −∆x ; t)
∆2x
∂Y
∂t≈ Y (x ; t + ∆t)− Y (x ; t)
∆t
I x ∈ [0, L]: locationI b(x): diffusion coefficientsI Boundary conditions: Y (0; t) = Y0, Y (L; t) = YL,{Y (x ; 0) : 0 ≤ x ≤ L} (known or have prior distribution)
15
Process Models for the DSTM:PDE-Based Parameterizations
For three locations, x1, x2, x3, this yields:
Y (x ; t + ∆t) ≈ θ1(x)Y (x ; t) + θ2(x)Y (x + ∆x ; t)
+ θ3(x)Y (x −∆x ; t),
⇒
Y (x1; t + ∆t)Y (x2; t + ∆t)Y (x3; t + ∆t)
≈θ1(x1) θ2(x2) 0θ3(x2) θ1(x2) θ2(x2)
0 θ3(x3) θ1(x3)
Y (x1; t)Y (x2; t)Y (x3; t)
+
θ3(x1) 00 00 θ2(x3)
(Y0
YL
)
I θi (x) are known functions of ∆x ,∆t , b(x), b(x −∆x), andb(x + ∆x)
16
Process Models for the DSTM:PDE-Based Parameterizations
Add a stochastic term:
Yt+∆t = M(θ)Yt + M(b)(θ)Y(b)t + ηt+∆t
, ηtiid∼ N (0,Qη)
I θi (x) are known functions of ∆x ,∆t , b(x), b(x −∆x), andb(x + ∆x)
I This is a lagged nearest-neighbor model!
I Estimation of b(x) and Qη is nontrivial, may requiresimplifying assumptions
17
Process Models for the DSTM:Nonlinear Models (Sec. 7.3)
Nonlinear autoregressive model:
Yt =M(Yt−1,ηt ;θt), (7.62)
I M: nonlinear Markovian function control process evolution
18
Nonlinear Models: Local Linear Approximations
Take a Taylor expansion (delta method) of M(·) in (7.62):
Yt =M(Yt−1) + ηt (7.63)
M(Yt) ≈M(Yt−1) + Mt(Yt−1 − Yt−1)), E [Yt ] = Yt (7.64)
(Mt)ij =∂Mi (Y)
∂Yt(sj)
∣∣∣∣Yt=Yt−1
(7.65)
We can therefore write Yt using the form:
Yt = ct + MtYt−1 + ηt (7.66)
I Often times must determine M by estimating θt
19
Nonlinear Models: General Quadratic Nonlinearity
Take a second-order Taylor expansion of M(·) in (7.62):
Yt =M(Yt−1) + ηt (7.63)
M(Yt) ≈M(Yt−1) + Mt(Yt−1 − Yt−1))
+1
2(I⊗ (Yt − Yt)
′)Ht(Yt − Yt) (7.80)
Ht ≡ Ht(Yt) =
H1t(Yt)...
Hnt(Yt)
∣∣∣∣∣Yt=Yt
(7.81)
(Hit(Yt))kl =∂2Mi (Yt)
∂Y (sk)∂Yt(sl)
∣∣∣∣Yt=Yt
(7.82)
20
Multivariate DSTMs (Sec. 7.4)
What if we want to model multiple spatio-temporal processes thatare co-dependent (e.g. temperature, salinity, current speed in theocean)?
Yt = MYt−1 + ηt , ηtiid∼ N (0,Qη) (7.98)
Yt = (Y(1)′
t ... Y(K)′
t )′
I The key is in reducing the dimensionality of this problem
21
Multivariate DSTMs: Reduced Rank Approach
Yt = MYt−1 + ηt , ηtiid∼ N (0,Qη) (7.98)
Yt = (Y(1)′
t ... Y(K)′
t )′
Y(k)t = Φ(k)α
(k)t + ν
(k)t
I Use reduced rank representation of Φ(k)
I Can choose covariance structure of ν(k)t depending on
approximation of Φ(k)
22
Multivariate DSTMs: Modeling Via Common Processes
Now assume:
Yt =
∫Ds
H(s, x ;θ)αt(x) dx + γt(s) (7.103)
Yt ≡ (Y(1)t ... Y
(K)t )′
αt(s) ≡ (α(1)(s), ..., α(J)(s))′
γt(s) ≡ (γ(1)t (s), ..., γ
(K)t (s))′
for kernel matrix H(·, ·; ·)I Hence, we represent the K processes in terms of J processes
with J < K
I If the α(i)t (·) have simple structure (e.g. independent AR(1)
processes with unit variances), and if H has assumed simplestructure, this can be helpful
23
Multivariate DSTMs: Modeling Via Common Processes
Approximate the integral with sums:
Yt =
∫Ds
H(s, x ;θ)αt(x) dx + γt(s) (7.103)
⇒ Y(k)t (s) ≈
pα∑i=1
J∑j=1
h(kj(s, xi ;θ)α(j)t (xi ) (7.104)
⇒ Yt(s) ≈ H(s)αt + γt(s) (7.105)
for s, x1, x2, ..., xpα ∈ Ds .
24
Questions?
25