Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Title
Spatial Statistics for Point Processesand Lattice Data (Part I)
Tonglin Zhang
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Outline
Outline
I Introduction
I Examples
I Point Processes
I Lattice Data
I Popular Models
I Connection
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Introduction
Introduction
Spatial Statistis has three areas:I Geostatistics: fixed station observations (kriging, correlation
functions, Gaussian process, and etc).
I Point Processes: locations are random (point patterns,marked point patterns, K-functions, and etc).
I Lattice Data: aggregated unit level data (cluster andclustering detection, spatial autocorrelation).
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Examples
Point Processes: Tree Locations
Ambrosia dumosa is a drought deciduous shrub with 20-60cm inheight which is abundant on well drained soils below one thousandmeter elevation. The data were were collected within a hectare(100× 100m2) area in the Colorado Desert in 1984. The datacontains
I locations 4358 Ambrosia dumosa trees;
I the height of the plant canopy;
I the length of the major axis of the plant canopy;
I the length of the minor axis of the plant canopy;
I the volume of the plant canopy.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Examples
0 20 40 60 80 100
020
4060
8010
0
Figure : Tree locations in Ambrosia dumosa plant data.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Examples
Point Processes: Forest Wildfires
−120 −118 −116 −114 −112 −110
5052
5456
5860
Longitude
Latit
ude
Figure : The Alberta (Canada) forest wildfire data contained wildfirelocations and area burned from 1996 to 2010.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Examples
Point Processes: Earthquakes
Figure : The Japan Earthquake data contained earthquake locations andmagnitudes from 2002 to 2011.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Examples
Lattice Data: Infant Morality
Rate249 - 11881188 - 18601860 - 22372237 - 31623162 - 39393939 - 7260
100 0 100 200 Miles
N
Figure : County level infant mortality rate per 100,000 in Guangxi, Chinain 2000.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Examples
Lattice Data: Crimes
1979-84 rate0 - 1.411.41 - 2.462.46 - 3.123.12 - 3.793.79 - 4.514.51 - 6.156.15 - 46.57
80 0 80 Miles
N
EW
S
Figure : County level homicide rate (per 100,000) in St. Louise area from1978 to 1984.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Examples
Lattice Data: Cancers
Marion
40 0 40 Miles
Legend56 - 9195 - 103104 - 114115 - 129130 - 151154 - 258
Figure : County level Male Colorectal Cancer Rate (per 100,000) between2003 and 2007.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
Definition
Let S ⊆ Rd be the domain. Let N(A) be the number of points inA for any A ⊆ S. Then, the distribution of N is given by
Pk [N(A1) = n1, · · · ,N(Ak) = nk ]
for any A1, · · · ,Ak ⊆ S and k ∈ N+.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
Intensity Functions
The k-th order intensity function of N (if it exists) as
λk(s1, · · · , sk) = lim|dsi |→0,i=1,··· ,k
{E [N(ds1) · · ·N(dsk)]
|ds1| · · · |dsk |
},
where si are distinct points in S, dsi is an infinitesimal regioncontaining si ∈ S, and |dsi | is the Lebesgue measure of dsi .
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
I Both Pk and λk can be used as the distribution of N.
I People focus on λk more than Pk .
I For Poisson point process, if A1, · · · ,Ak are disjoint, then
Pk [N(A1) = n1, · · · ,N(Ak) = nk ] =k∏
i=1
µni (Ai )
ni !e−µ(Ai ),
where µ is the mean measure.
I If s1, · · · , sk are distinct, then
λk(s1, · · · , sk) =k∏
i=1
λ(si ).
I It shows that if Pk and λk exists for any k, then Pk and λk
are equivalent.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
Mean and Variance Functions
The mean measure of N is
µ(A) =
∫Aλ(s)ds.
The covariance structure of N is
Cov [N(A1),N(A2)]
=
∫A1
∫A2
[λ2(s1, s2)− λ(s1)λ(s2)]ds2ds1 +
∫A1∩A2
λ(s)ds.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
Let
g(s1, s2) =λ2(s1, s2)
λ(s1)λ(s2).
Then, g is called the pair correlation function and the covariancestructure is
Cov [N(A1),N(A2)]
=
∫A1
∫A2
[g(s1, s2)− 1]λ(s1)λ(s2)ds2ds1 +
∫A1∩A2
λ(s)ds.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
Strong Stationarity
A spatial point process N is said strong stationary if for anymeasurable A1, · · · ,Ak ∈ B(Rd) the joint distribution of
N(A1 + s), · · · ,N(Ak + s)
does not depend on s, where
Ai + s = {s′ + s : s′ ∈ Ai}.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
Second-Order Stationarity
If λ(s) is a constant and
λ2(s1, s2) = λ2(s1 − s2),
then N is called second-order stationary. In addition, if
λ2(s1, s2) = λ2(∥s1 − s2∥),
then N is called isotropic.If N is isotropic, then
g(s1, s2) = g(∥s1 − s2∥).
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Point Processes
K-functions
Suppose N is stationary. Let λ be the first-order intensity function.Then, the K -function is defined by
K (t) =1
λE [number of extra events within
distance of t of a randomly chosen event].
The L-function is
L(t) =
√K (t)
π− t.
In real application, K (t) is more often used.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Lattice Data
Formulation
A study area S is partitioned in A1, · · · ,Am units. At least thereare
I Event counts: yi ;
I At risk population sizes: ni ;
I Explanatory Variables: xi .
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Lattice Data
Cluster Detection
SupposeYi ∼ Poisson(niθi )
where θi is called the incidence rate. If there is a cluster C (asubset of S) in the study area, then it is assumeed that θi = θc ifi ∈ C and θi = θ0 if i ̸∈ C . One tests
H0 : θc = θ0 ↔ Ha : θc > θ0.
Sometimes, one uses Ha : θc ̸= θ0.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Lattice Data
Spatial Scan Test
Given C , the likelihood ratio statistic is
ΛC =(yC/nC )
yC (yC̄/nC̄ )yC̄
(y/n)y,
where yC =∑
i∈C yi , yC̄ =∑
i ̸∈C yi , y =∑n
i=1 yi , nC =∑
i∈C ni ,nC̄ =
∑i ̸∈C ni , and n =
∑mi=1 yi .
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Lattice Data
Since C is unknown, it is often assumed that C ∈ C where C is acollection of candiates of clusters. Then, the spatial scan statistic is
Λ = supC∈C
ΛC .
It can be seen that
I no explanatory variables are involved;
I data are Poisson; and
I disease rates are equal within clusters, and outside of clusters,respectively.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Lattice Data
There are a few modifications:
I loglinear models: Zhang, T. and Lin, G. (2009). Spatial scanstatistics in loglinear models. Computational Statistics andData Analysis, 53, 2851-2858;
I overdispersion: Zhang, T., Zhang, Z. and Lin, G. (2012).Spatial scan statistics with over dispersion. Statistics inMedicine, 31, 762-774.
I zero inflation: Cancado, A.L.F., de-Silva, C.Q., and da Silva,M.F. (2014). A spatial scan statistic for zero-inflated Poissonprocess. Environmental and Ecological Statistics, 21, 627-650.
I zero inflation and overdispersion: de Lima, M.S., Duczmal,L.H., Neto, J.C., and Pinto, L.P. (2014). Spatial scanstatistics for models with overdispersion and inflated zeros.Statistica Sinica, preprint (doi:10.5705/ss.2013.220w).
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Lattice Data
Disease Mapping
Then, one can model
yi ∼ Poisson(niθi ),
whereθi = xti β + Ui
and Ui is a spatial random effect.
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Connection
SAR (Spatial Autoregressive) Model
Let u = (U1, · · · ,Um)T . Then, the model assumes
u = ρWu+ ϵ,
where wij = 1/|∂i | if j ∈ ∂i (neightbors of unit i) andϵ ∼ N(0, σ2
uI).
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Lattice Data
CAR (Conditional Autoregressive) Model
Let Ei be the expected value under the model without Ui . Then,the model assumes
u ∼ N(0, σ2u(I−W)−1D),
where σ2u > 0, D = diag(di ) with di = E−1
i , and wij = ρ√Ei/Ej
for j ∈ ∂i .
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data
Connection
I For any point process, it can be aggregated to a lattice datawithout xi .
I For any marked point process, it can be aggregated to alattice data with xi .
I If there are point process for events, point process for at riskpopulations, and point process for explanatory variables, wecan also get a lattice data with xi .
I Geostatistical data can also be used as xi .
Tonglin Zhang, Department of Statistics, Purdue University Spatial Statistics for Point and Lattice Data