ST3454 Stochastic processes in Space and Time
Rozenn Dahyot
School of Computer Science & StatisticsTrinity College Dublin, Ireland
https://www.scss.tcd.ie/Rozenn.Dahyot/
Hilary term 2017
1
Space and time I
This course is concerned by the study of quantities (stochastic processes)
that varies over the spacetime domain.
The spacetime of our universe is usually interpreted as an Euclidean space
with space consisting of three dimensions, and time as consisting of one
dimension (the 4th dimension).
2
Space and time II
In this course:
space x will be limited to R2 (2D Euclidian space) as a local planar
approximation of the earth surface
time t will be limited to R+.
3
Stochastic Processes I
we note s the stochastic process of interest, that may vary over the time
domain s(t), the space domain s(x), or spacetime s(x, t). For instance,
the stochastic process of interest may be the temperature in Ireland
(space) over the next few days (time).
4
Stochastic Processes II
Sometimes, s is not measurable directly (at any location in spacetime) but
instead we can observe another related stochastic process o.
A simple example is o the temperature as read on a thermometer, while s
is the true temperature with the relation o = s+ ε such that ε is the noise
associated with the sensor (thermometer).
5
Course content overview I
Many techniques seen in this class are Linear Smoothing methods:
s(x, t) =
n∑i=1
λi(x, t) s(i) + ε(x, t)
with having observed the stochastic process s at n (spacetime) locations
{(x(i), t(i), s(i))}i=1,··· ,n,
The goal is to estimate the quantity s at a new given location (x, t) and if
possible getting a measure for its uncertainty.
6
Course content overview II
Estimation depends on the model chosen for the weights λ and
hypotheses about ε.
s(x, t) =
n∑i=1
λi(x, t) s(i) or for short s =
n∑i=1
λi s(i) = 〈 ~λ|~s 〉
Note∑ni=1 λi s
(i) = ~λT~s = 〈 ~λ|~s 〉.
Examples: Regression, ARIMA, Filtering, Kriging, Nadaraya-Watson estimator.
7
Content & Exam
We will attempt to study all the following:
Regression as a Linear Smoothing method
ARIMA models
Multivariate State-Space Model & Kalman Filter,
Kriging,
Functional data analysis
For illustration, some R labs may be organised.
Exam (100%):
8
References
Gaussian Markov Random Fields - Theory and Applications , H. Rue & L. Held,
2005.
Geostatistics for Environmental Scientists , R. Webster & M. A. Oliver .
Statistics for Spatial Data , N. A. C. Cressie, Wiley 1993.
Functional Data Analysis , J.O. Ramsay and B. W. Silverman, Springer 2006.
Statistical Models of Shape - Optimisation and Evaluation , R. Davies, C. Twining
& C. Taylor, Springer 2008.
Linear Models for Multivariate, Time Series, and Spatial Data , R. Christensen,
Springer 1991.
9
Stochastic processes I
Definition (stochastic process)
A Spatio-temporal stochastic process s(x, t) is defined such that:
x is a spatial location,
t is the time,
s is a random quantity of interest. s takes values in the state space S.
For a two state model, S = {S1, S2}, a probability P(sxt = S1) (or P(sxt = S2),
such that P(sxt = S1) + P(sxt = S2) = 1) is associated with s at the
spatiotemporal position (x, t) (noted sxt).
10
Stochastic processes II
Like the state space, the variables x and t can take values in a discrete
(e.g. N, Z, N2 etc.) or continuous spaces (eg. R, R2, or R3 or S2 (unit
sphere)).
When no direct observations is available for s (hidden random variable),
another stochastic process o(x, t) linked to s(x, t) and for which
observations are available, is defined.
11
Stochastic processes III
Many applications use stochastic processes to estimate s with its
associated uncertainty at a new (unobserved) location in spacetime.
For most examples in this course, we look at stochastic processes s that
are 1-dimensional (it is a random variable and not a random vector).
12
Stochastic processes IV
Several cost functions (e.g. probability density functions) will be
presented in the course to compute a prediction (estimate) and its
associated uncertainty.
The Normal distribution is often used as a hypothesis in these modellings.
13
Regression as a Linear Smoothing method I
Imagine the following model for stochastic process s:
s(t) = c+ α t+ ε(t)
Let assume:
E[ε(t)] = 0, ∀t.
E[(ε(t))2] = σ2, ∀t.
14
Regression as a Linear Smoothing method II
Questions. Assuming that we have collected n observations {(t(i), s(i))}ni=1
propose an estimate s for s at a new location t.
show that that estimate can be written as
s(t) =
n∑i=1
λi(t) s(i)
15
Regression as a Linear Smoothing method III
Answers. Defining ~s = [s(1) · · · s(n)]T and matrix
U =
1 t(1)
......
1 t(n)
then the least squares estimate minimising the sum of square errors
n∑i=1
ε2i =
n∑i=1
(s(i) − c− α t(i))2
is [c
α
]= (UTU)−1UT~s
16
Regression as a Linear Smoothing method IV
Having compute estimates c and α an estimate for s at new temporal location
t can be proposed:
s = c+ α t = [1 t]
[c
α
]= [1 t] (UTU)−1UT︸ ︷︷ ︸
~λT
~s =
n∑i=1
λi s(i)
17
Regression as a Linear Smoothing method V
We suggested the following model for stochastic process s:
s(t) = c+ α t+ ε(t) with ε(t) ∼ N (0;σ2),∀t
limitation of such model ?
other explanatory variables that could be used ?
18
Regression as a Linear Smoothing method VI
Exercise: Consider the model
s(x, t) = c+ α t+ β x+ γ x t+ ε(x, t) with ε(x, t) ∼ N (0;σ2),∀t
n observations {(x(i), t(i), s(i))}i=1,··· ,n are available.
Propose an estimate s at a new location (x, t).
Show that this solution can also be written s(x, t) =∑ni=1 λi(x, t) s
(i).
19
Time series analysis I
Example time series: Beer production. The Monthly Australian beer
production has been observed between Jan 1991 to Aug 1995.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1991 164 148 152 144 155 125 153 146 138 190 192 192
1992 147 133 163 150 129 131 145 137 138 168 176 188
1993 139 143 150 154 137 129 128 140 143 151 177 184
1994 151 134 164 126 131 125 127 143 143 160 190 182
1995 138 136 152 127 151 130 119 153
20
Time series analysis II
Time
beer
1991 1992 1993 1994 1995
120
140
160
180
21
Time series analysis III
Questions:
What are the patterns in the beer data ?
What model can you propose to model this stochastic process ? (what
explanatory variable would you use ?)
22
ARIMA models I
Stochastic process s in the time domain
AR(1)
{s(t+ ∆) = c+ φ1 s(t) + ε(t+ ∆)} ≡ {st+1 = c+ φ1 st + εt+1}
MA(1)
{s(t+ ∆) = c+ φ1 ε(t) + ε(t+ ∆)} ≡ {st+1 = c+ φ1 εt + εt+1}
with ∆ a fixed interval of time.
23
ARIMA models II
Exercise: Assume a simple AR(1) model:
st = α st−1 + εt with εt ∼ N (0, σ2) ∀t, |α| < 1
with observations {s(1)t }t=1,··· ,n. Draw a probabilistic graph to model this time
series.
Gaussian Markov Random Fields - Theory and Applications , H. Rue & L. Held,
2005 (chp. 1).
24
ARIMA models III
Exercise: Assume a simple AR(1) model:
st = α st−1 + εt with εt ∼ N (0, σ2) ∀t, |α| < 1
Show
p(st|st−1, · · · , s1) = p(st|st−1) = N (st;α st−1, σ2)
25
ARIMA models IV
Exercise: Given a temporal stochastic process s with the following Markov property
p(st|st−1) = N (st;α st−1, σ2) ∀t
with observations {s(1)t }t=1,··· ,n.
Estimate s at a new time n+ 1 with its confidence interval.
Give an estimate for s at a new time n+ 2 with its confidence interval.
26
ARIMA models V
27
ARIMA models VI
We know
p(st|st−1) = N (st;α st−1, σ2) ∀t
so at n+ 1:
p(sn+1|sn) = N (sn+1;α sn, σ2)
and we use the observation s(1)n that is available for sn:
p(sn+1|s(1)n ) = N (sn+1;α s(1)n , σ2)
A good estimate is the expectation sn+1 = α s(1)n and the 95% prediction interval is
sn+1 ± 2σ.
28
ARIMA models VIIAt n+ 2:
p(sn+2|sn+1) = N (sn+2;α sn+1, σ2)
But we dont have observation for sn+1 ! so instead we use:
p(sn+2|sn) =∫p(sn+2, sn+1|sn) dsn+1
=∫p(sn+2|sn+1, sn) p(sn+1|sn) dsn+1
=∫p(sn+2|sn+1) p(sn+1|sn) dsn+1
=∫N (sn+2;α sn+1, σ
2) N (sn+1;α sn, σ2) dsn+1
= N (sn+2;α2 sn, σ
2 (1 + α2))
A good estimate is the expectation sn+2 = α2 s(1)n using the observation s(1)n , and the
95% prediction interval is sn+2 ± 2σ√1 + α2.
29
ARIMA models VIII
30
Filters I
31
Filters II
Given a hidden temporal stochastic process (s1, · · · , sJ) and observed one
(o1, · · · , oJ) linked byps1(s1) Initial state
psj |sj−1(sj |sj−1) transition (order 1)
poj |sj (oj |sj) link between observed variable and hidden state
Estimate the posterior psj |o1:j . Hint: Express psj |o1:j (sj |o1:j) w.r.t.
psj−1|o1:j−1(sj−1|o1:j−1) (notation o1:j−1 = (o1, · · · , oj−1)).
32
Filters III
psj |o1:j (sj |o1:j)
=∫p(sj , sj−1|o1:j) dsj−1 (Total probability theorem)
= 1p(o1:j)
∫p(st, st−1, o1:j) dsj−1 (Bayes)
= 1p(o1:j)
∫p(sj , sj−1, oj , o1:j−1) dsj−1
=p(o1:j−1)p(o1:j)
∫p(oj |sj , sj−1, o1:j−1) p(sj |sj−1, o1:j−1) p(sj−1|o1:j−1) dsj−1 (Bayes)
=p(o1:j−1)p(o1:j)
∫p(oj |sj) p(sj |sj−1) p(sj−1|o1:j−1) dsj−1
33
Filters IV
Hence
p(sj |o1:j) =p(o1:j−1)
p(o1:j)p(oj |sj)
∫p(sj |sj−1) p(sj−1|o1:j−1) dsj−1
given a hidden temporal stochastic process (s1, · · · , sJ) and observed one
(o1, · · · , oJ) linked byps1(s1) Initial state
psj |sj−1(sj |sj−1) transition (order 1)
poj |sj (oj |sj) link between observed variable and hidden state
34
Filters V'
&
$
%
In the general case, we have the following integral to solve:
p(sj |o1:j) =p(o1:j−1)
p(o1:j)p(oj |sj)
∫p(sj |sj−1) p(sj−1|o1:j−1) dsj−1
1 Prediction:
p(sj |o1:j−1) =
∫p(sj |sj−1) p(sj−1|o1:j−1) dsj−1
2 Update
p(sj |o1:j) ∝ p(oj |sj) p(sj |o1:j−1)
35
Filters VI
Having collected observations {o(1)t }t=1,··· ,j−1 (also noted o(1)1:j−1)), the
probability density function to predict is sj is
p(sj |o(1)1:j−1)
Once the new observation has been collected at time j, o(1)j , the
probability density function taking this update into account is
p(sj |o(1)1:j )
36
Filters VII
37
Kalman Filter I
We have illustrated the idea behind Filtering with a transition model of
order 1 (This can be extended to any order). In practice, we need to be
able to calculate or compute p(sj |o1:j). Kalman filtering is one elegant
way of doing that.
In 1960, R.E. Kalman published his famous paper describing a recursive
solution to the discrete-data linear filtering problem.
The Kalman filter uses hypotheses of linearity (in the state transition
equation, and the observation equation) and Normal distributions.
38
Kalman Filter II
Exercise: Givenp(s1) = N (s1;µ1, σ
21) (initial state pdf)
p(sj |sj−1) = N (sj ;αsj−1, σ2) (state transition pdf)
p(oj |sj) = N (oj ; sj , σ2o) (observation pdf)
with the notation N (x;µ, σ2) corresponding to the normal pdf for r.v. x with
mean µ and variance σ2. Show by induction that p(sj |o1:j) is a normaldistribution N (sj ;µj , σ
2j ) and give the recursive formula to compute
parameters (µj , σ2j ) assuming p(sj−1|o1:j−1) = N (sj−1;µj−1, σ
2j−1).
39
Kalman Filter III
Answer.
1 at j = 1, we know the initial state s1 is normally distributed (given in the
hypothesis).
2 assuming at j − 1 that p(sj−1|o1:j−1) = N (sj−1;µj−1, σ2j−1), then solving
the prediction and update steps of filtering, we find that:
µj = α (1−Kj) µj−1 +Kj oj
σ2j = σ2
o Kj
Kj =σ2+α2σ2
j−1
σ2+σ2o+α
2σ2j−1
40
Kalman Filter IV
1 the prediction step corresponds to (using result < NA|NB >):
p(sj |o1:j−1) =∫N (sj ;αsj−1, σ
2) N (sj−1;µj−1, σ2j−1) dsj−1
= 1|α|
∫N(sj−1;
sjα, σ
2
α2
)N (sj−1;µj−1, σ
2j−1) dsj−1
= 1|α| N
(0;
sjα− µj−1,
σ2
α2 + σ2j−1
)= N (sj ;αµj−1, σ
2 + α2σ2j−1)
2 update (rewriting is required to be able to identify (µj , σ2j )):
p(sj |o1:j) ∝ N (oj ; sj , σ2o) N (sj ;αµj−1, σ
2 + α2σ2j−1)
= N (sj ;µj , σ2j )
41
Kalman Filter V
These two parameters (µj , σ2j ) are enough to characterise fully the
Normal distribution p(sj |o1:j).
The observations available {o(1)j } are plugged in the expression (µj , σ2j )
so that they can be computed iteratively.
42
State-Space Model I
In the Multivariate case:
State equation with ~sj a random vector
~sj = F ~sj−1 + G ~εj
F is a square matrix, G is a matrix, and ~εj ∼ N (0,Σ) is a random vector.
Observation equation
~oj = H ~sj + ~ej
with H a matrix and ~ej ∼ N (0,Σo).
43
State-Space Model II
Given p(~sj−1|~o1:j−1) = N (~sj−1;µj−1,Σj−1)
prediction
p(~sj |~o1:j−1) = N (~sj ; Fµj−1,R = FΣj−1FT + GΣGT )
update p(~sj |~o1:j) = N (~sj ;µj ,Σj) with
Σj =(HTΣ−1o H + R−1
)−1and
µj = Σj(HTΣ−1o ~oj + R−1Fµj−1
)
44
State-Space Model III
Example
Consider the following model
1 Observation equation:
oj = sj + ej
2 State equation
AR(3) : sj = φ1 sj−1 + φ2 sj−2 + φ3 sj−3 + εj
Define ~sj , ~oj , H, F and G .
45
State-Space Model IV
Answer: Using Matrix notation, the State equation is:sj
sj−1
sj−2
︸ ︷︷ ︸
~sj
=
φ1 φ2 φ3
1 0 0
0 1 0
︸ ︷︷ ︸
F
sj−1
sj−2
sj−3
︸ ︷︷ ︸
~sj−1
+
εj
0
0
︸ ︷︷ ︸
~ej
and the observation equation is:
oj = (1, 0, 0)︸ ︷︷ ︸H
sj
sj−1
sj−2
︸ ︷︷ ︸
~sj
+ej
46
State-Space Model V
Exercises: Find F and G when sj follows
1 an MA(2)
2 an ARMA(2,2)
47
State-Space Model VI
F, G and H are assumed to be known and constant over time.
More general state space models allow these matrices to depend on time.
48
Introduction to geostatistics I
Founders: D. G. Krige and G. Matheron formulated the theory of
geostatistics and Kriging in the 1950’s.
Application to mining industry
We want to predict at unobserved locations.
Geostatistics is the study of spatial data where the spatial correlation is
modeled through the variogram. Focus of geostatistical analyses is on
predicting responses at unsampled sites.
49
Introduction to geostatistics II
Applications:
hydrological data,
mining applications,
air quality studies
soil science data
biological applications
economic housing data
etc.
Geostatistics for Environmental Scientists , R. Webster & M. A. Olvier, Wiley
2001.
50
Introduction to geostatistics III
Lets consider:
J physical locations {xj}j=1,··· ,J ,
Some information of interest (e.g. radioactivity levels) is modeled as a
stochastic process at these locations {sj = s(xj)}j=1,··· ,J ,
One observation (or measurement) is available for each site
{s(1)j }j=1,··· ,J .
At a new location x0, we want to predict s0.
51
Introduction to geostatistics IV
Content:
1 Adhoc methods: spatial interpolation
2 Statistical modeling with Kriging
1 simple, ordinary and universal Kriging
2 modelling with covariance and/or variogram
3 Stationarity assumptions
52
Spatial interpolation I
Prediction of s at a new site x0 can be expressed as a weighted averages of
data:
s(x0) =
J∑j=1
λj s(xj)
with the constraints:
(λj ≥ 0, ∀j) ∧
J∑j=1
λj = 1
53
Spatial interpolation II
1 Thiessen polygones (Voronoi polygons, Dirichlet tesselations)
2 Triangulation
3 Natural Neighbour interpolation
4 Inverse function of distance
5 Trend surface
54
Spatial interpolation III
1 Thiessen polygones (Voronoi polygons, Dirichlet tesselations)
s(x0) = s(xj) with xj = arg mini=1,··· ,J
‖xi − x0‖
so we have binary weights:
λj = 1, and λi = 0 ∀i 6= j
55
Spatial interpolation IV
Figure: Voronoi polygons
56
Spatial interpolation V
2 Triangulation. Sampling points are linked to their neighbours by straight
lines to create triangles that do not contain any of the points. Having a
new position x0 = (u0, v0) in one of the triangle, let says the one defined
by (x1, x2, x3), then
λ1 =|x0 − x3 ; x2 − x3||x1 − x3 ; x2 − x3|
with the notation
|x0−x3 ; x2−x3| =
∣∣∣∣∣ u0 − u3 u2 − u3v0 − v3 v2 − v3
∣∣∣∣∣ = det
([u0 − u3 u2 − u3v0 − v3 v2 − v3
])
λ2 and λ3 are defined in a similar fashion and all the other λs are 0s.
Unlike Thiessen method, the resulting surface is continuous but yet has
abrupt changes in gradient at the margins of the triangles.
57
Spatial interpolation VI
Figure: Triangulation: The weights λ1 corresponds to the blue area divided by the
area of the triangle (x1, x2, x3). Similarly λ2 corresponds to the green area
divided by the area of the triangle (x1, x2, x3) and λ3 corresponds to the pink area
divided by the area of the triangle (x1, x2, x3).
58
Spatial interpolation VII3 Natural Neighbour interpolation
59
Spatial interpolation VIII
60
Spatial interpolation IX
61
Spatial interpolation X
62
Spatial interpolation XI
λj =aj∑Jj=1 aj
with aj = 0 if xj is not a natural neighbour to x0.
63
Spatial interpolation XII
4 Inverse function of distance
λj ∝1
‖xj − x0‖β, β > 0
I The weights {λj}j=1,··· ,J are scaled such that they sum up to 1.I Usually, β = 2 (Euclidian distance).I If x0 = xj , then s(x0) = s(xj).I There are no discontinuities in the map s.I There is no measure of the error.
64
Spatial interpolation XIII
5 Trend Surface. This method proposes to do regression:
s(x) = µ(x) + ε
with the error term ε ∼ N (0, σ2ε ). The function µ is a parametric function
such as planes or quadratics e.g.
µ(x = (u, v)) = b0 + b1 u+ b2 v
Coefficients b = (b0, b1, b2)T can then be estimated by Least Squares
using the J observations.
Once b is estimated, the prediction at the new location x0 is computed by:
s(x0) = b0 + b1 u0 + b2 v0
65
Spatial interpolation XIV
Limits of Interpolation for prediction:
Some interpolators give a crude prediction and the spatial variation is
displayed poorly.
The interpolators fail to provide any estimates of the error on the
prediction.
With the exception of trend surface, these methods were deterministic.
However the processes are stochastic by nature.
In practice the modelling with trend surface is too simplistic to perform
well and the uncertainty is the same everywhere.
66
Introduction: Kriging I
The aim of Kriging is to estimate the value of a random variable s at one
or more unsampled points or locations, from more or less sparse sample
data on a given support say {s(x1), · · · , s(xJ)} at {x1, · · · , xJ}.
Different kinds of kriging methods exist, which pertains to the
assumptions about the mean structure of the model:
E[s(x)] = µ(x) or E[s(x)− µ(x)︸ ︷︷ ︸s(x)
] = 0
67
Introduction: Kriging II
Different Kriging methods:I Ordinary Kriging :
E[s(x)] = µ (µ is unknown)
I Simple Kriging :
E[s(x)] = µ (µ is known)
I Universal Kriging : the mean is unknown and depends on a linear model:
µ(x) =
P∑p=0
βp φp(x)
and coefficients {βp} need to be estimated.
68
Ordinary kriging I
Ordinary kriging is the most common type of kriging.
The underlying model assumption in ordinary kriging is:
E[s(x)] = µ
with µ unknown.
The stochastic process s has been observed at J sites ( the r.v.
s(xj) = sj has one observation s(1)j associated with it).
69
Ordinary kriging II
The model for s(x0) is:
s(x0)− µ =
J∑j=1
λj (s(xj)− µ) + ε(x0)
or
s(x0) =
J∑j=1
λj s(xj) + µ (1−J∑j=1
λj) + ε(x0)
We filter the unknown mean by requiring that the kriging weights sum to
1, leading to the ordinary kriging estimator :
s(x0) =
J∑j=1
λj s(xj) + ε(x0) subject toJ∑j=1
λj = 1
70
Ordinary kriging III
ε(x0) is the noise at position x0 such that:
E[ε(x0)] = 0
We want to estimate s(x0). In other words we need to get the appropriate
{λj}j=1,··· ,J .
Estimation by Mean square errors subject to a constraint:
(λ1, · · · , λJ) = arg minλ1,··· ,λJ
{E[ε(x0)2]
}subject to
J∑j=1
λj = 1
71
Ordinary kriging IV
This is solved using Lagrange multipliers. We define the energy J thatdepends both on {λj}j=1,··· ,J and ψ:
({λj}j=1,··· ,J , ψ) = arg minψ,λ1,··· ,λJ
J (λ1, · · · , λJ , ψ) = E[ε(x0)2] + 2 ψ
J∑j=1
λj − 1
72
Ordinary kriging V1 First we express the expectation of the error:
E[ε(x0)2] = E[(s(x0)−
∑jj=1 λj s(xj)
)2]= E
[(s(x0)− µ+ µ−
∑Jj=1 λj s(xj)
)2]= E
[(s(x0)− µ)2
]− 2
∑Jj=1 λjE [(s(x0)− µ) (s(xj)− µ)]
+∑Ji=1
∑Jj=1 λiλjE [(s(xj)− µ) (s(xi)− µ)]
Remember that the covariance is defined as
Cov(s(xj); s(xi)) = cij = E [(s(xj)− µ) (s(xi)− µ)]
So the energy to minimize:
J (λ1, · · · , λJ , ψ) = c00 − 2J∑j=1
λjc0j +J∑i=1
J∑j=1
λiλjcij + 2 ψ
(J∑j=1
λj − 1
)
73
Ordinary kriging VI
2 Second, we differentiate J w.r.t. λk, k = 1, · · · , J and ψ, and the minimum
of J is found when all the derivatives are equal to zeros.∂J∂ψ
= 0
∂J∂λk
= 0, ∀k = 1, · · · , J
The derivative w.r.t. ψ is:
∂J∂ψ
=
J∑j=1
λj − 1 = 0
The derivative w.r.t. λk is :
∂J∂λk
= 2 ψ − 2 c0k + 2J∑j=1
λj cik = 0
74
Ordinary kriging VII
3 The solution is:c11 · · · c1J 1... 1
cJ1 · · · cJJ 1
1 · · · 1 0
︸ ︷︷ ︸
A
λ1
...
λJ
ψ
︸ ︷︷ ︸
λλλ
=
c10...
cJ0
1
︸ ︷︷ ︸
b
or
λλλ = A−1b
75
Ordinary kriging VIII
Once you have the estimate λλλ, then you can predict (using the
observations):
s(x0) =
J∑j=1
λj s(1)j
76
Simple Kriging I
Assumption for Simple Kriging:
The mean E[s(x)] = µ is known.
We estimate s(x0) using the relation (same as Ordinary Kriging):
s(x0)− µ =
J∑j=1
λj (s(xj)− µ) + ε(x0)
or
s(x0) =
J∑j=1
λj s(xj) + µ
1−J∑j=1
λj
+ ε(x0)
where µ is known. The λj do not need to be constrained to sum to 1 anymore
and the second term insured that E[s(x)] = µ, ∀x.
77
Simple Kriging II
The hypothesis for the error is E[ε(x0)] = 0 and we estimate {λj}j=1,··· ,J such
that the Mean Square Error E[ε2(x0)] is minimised.
The solution is then:λ1...
λJ
=
c11 · · · c1J...
cJ1 · · · cJJ
−1
c10...
cJ0
Once you have the estimate λλλ, then you can predict (using the observations):
s(x0) =
J∑j=1
λj s(1)j + µ
1−J∑j=1
λj
78
Universal Kriging I
For a new location x0, we have the following model
s(x0)− µ(x0) =
J∑j=1
λj (s(xj)− µ(xj)) + εx0
or
s(x0) =
J∑j=1
λj s(xj) + µ(x0)−J∑j=1
λj µ(xj) + εx0
In the Universal Kriging, the mean of s depends on the position x:
µ(x) =
P∑p=0
βp φp(x)
79
Universal Kriging II
Example of choice of the functions {φp}p=1,··· ,P of x = (u, v) ∈ R2:
Linear trend (P = 2):
φ0(x) = 1, φ1(x) = u, φ2(x) = v
Quadratic trend (P = 5):
φ3(x) = u2, φ4(x) = uv, φ5(x) = v2
80
Universal Kriging III
In a similar fashion as ordinary kriging (we dont know the βp, so µ is
unknown), we rewrite:
s(x0) =
J∑j=1
λj s(xj) + µ(x0)−J∑j=1
λj µ(xj) + εx0
as
s(x0) =
J∑j=1
λj s(xj) + εx0subject to µ(x0)−
J∑j=1
λj µ(xj) = 0︸ ︷︷ ︸constraint
81
Universal Kriging IV
Having µ(x) =∑Pp=0 βp φp(x), the constraint is equivalent to:
P∑p=0
βp φp(x0) =
P∑p=0
βp
J∑j=1
λj φp(xj)
This is true for any combination of βp. Hence we have in fact P + 1 constraints:φp(x0) =
J∑j=1
λj φp(xj)
∀p = 0, · · · , P
82
Universal Kriging V
Note that at p = 0, using φ0(x) = 1, we recover the constraint∑Jj=1 λj = 1.
This minimisation is solved by introducing P + 1 Lagrange multipliers:
(λ1, · · · , λJ , m0, · · · , mP
)= argmin
E[ε2x0 ] +P∑p=0
mp
φp(x0)− J∑j=1
λj φp(xj)
83
Universal Kriging VI
The solution of Universal Kriging is:
[C FT
F 0
]
λ1...
λJ
m0
m1
...
mP
=
c01...
c0J
φ0(x0)
φ1(x0)...
φP (x0)
84
Universal Kriging VIIwith
F =
φ0(x1) φ0(x2) · · · φ0(xJ)
φ1(x1) φ1(x2) · · · φ1(xJ)
φ2(x1) φ2(x2) · · · φ2(xJ)...
φP (x1) φP (x2) · · · φP (xJ)
and
C =
c11 · · · c1J...
cJ1 · · · cJJ
The prediction at x0 is computed by:
s0 =
J∑j=1
λj s(1)j
85
Remarks about Kriging I
The covariance function C[s(x), s(x′)] is a function assumed to be known
in all the solutions proposed here for Kriging.
In practice such a function is not known and need to be estimated.
Some hypotheses will be used about the process to ease this estimation.
We define the concept of variogram as an alternative to covariance and
we see next what are the assumptions about the process that will be
used for Kriging.
86
Remarks about Kriging II
Definition (variogram)
The variogram is defined as:
γ(s(xi), s(xj)) = γij =1
2E[(s(xi)− s(xj))2]
When E[(s(xi)− s(xj))] = 0 then the variogram is linked to the covariance as
follow:
γij =1
2(cii + cjj − 2 cij)
87
Stationarity I
Definition (Strict Stationarity)
The distribution of the random process has certain attributes that are the
same everywhere. Strict stationarity indicates that for any number k of any
sites x1, x2, · · · , xk, the joint cumulative distribution of (s(x1), · · · , s(xk))
remains the same under an arbitrary translation h:
P (s(x1), · · · , s(xk)) = P (s(x1 + h), · · · , s(xk + h))
If the random process is strictly stationary, its moments if they exist are also
invariant under translations.
88
Stationarity II
Definition (weak stationarity)
A process s(x) is said second order stationary (or weakly stationary, or
wide-sense stationary) when
1 the mean of the process does not depend on x: E[s(x)] = µ
2 the variance of the process does not depend on x: E[(s(x)− µ)2] = σ2
3 the covariance between s(x) and s(x+ h) only depends on h:
Cov [s(x), s(x+ h))] = E [(s(x)− E[s(x)]) (s(x+ h)− E[s(x+ h)])]
= E [(s(x)− µ) (s(x+ h)− µ)]
= C(h)
Note that C(0) = σ2.
89
Stationarity III
For a weakly stationary process, the autocorrelation can also be defined
as:
ρ(h) =C(h)
C(0)
with C(0) is the covariance at lag 0, i.e. σ2.
If a random field with the function C(h) only dependent on the distance
‖h‖ and not on its orientation, it is said to be isotropic.
90
Stationarity IV
Definition (Intrinsic Stationarity)
s(x) is said to be an intrinsic random function such that:
E[s(x+ h)− s(x)] = 0
and
Var[s(x+ h)− s(x)] = 2 γ(h)
The function γ(h) is called the variogram.
91
Stationarity V
Exercises: For second-order stationary processes,
1 Express γ(h) w.r.t. C(h).
2 Express γ(h) w.r.t. ρ(h).
3 Show that a process that is weakly stationary is also intrinsically
stationary.
92
Brownian Motion & Ornstein-Uhlenbeck Processes
A second order stationary process is also an intrinsic stationary process.
But an intrinsic stationary process is not always a second order stationary
process or a strictly stationary process.
To illustrate the weakly stationary and intrinsic stationary, we look at thefollowing processes:
I Brownian Motion,I Ornstein-Uhlenbeck Process.
93
Brownian Motion I
Definition (diffusion & Brownian Motion)
A diffusion is a continuous time Stochastic Process s(t) with the following
properties:
1 s(0) = 0,
2 s(t) has independent increments,
3 P (s(t2) | s(t1)) has a density function f(s(t2)|t1, t2, s(t1)).
Standard Brownian motion is a diffusion s(t), t ≥ 0 satisfying the following:
1 s(0) = 0.
2 s(t) has independent increments.
3 For t2 > t1, s(t2)− s(t1) ∼ N (0, σ2(t2 − t1)).
94
Brownian Motion II
s(t) has independent increments means that for all times
0 ≤ t1 ≤ t2 · · · ≤ tn the increments s(tn)− s(tn−1), s(tn−1)− s(tn−2), · · · ,s(t2)− s(t1) are independent random variables.
For t2 > t1, s(t2)− s(t1) ∼ N (0, σ2(t2 − t1)) is equivalent to
s(t2) | s(t1) ∼ N (s(t1), σ2(t2 − t1))
s(t2) given s(t1) is normally distributed with mean s(t1) and variance
σ2(t2 − t1).
95
Brownian Motion III
Exercises:
1 Show that a standard brownian motion is intrinsically stationary
2 Show that a standard brownian motion is not weakly stationary
96
Ornstein-Uhlenbeck Process I
Definition (Ornstein-Uhlenbeck)
Let s(t) be standard Brownian motion. The process
V (t) = e−t s(e2t)
is called the Ornstein-Uhlenbeck process.
97
Ornstein-Uhlenbeck Process II
Show that V (t) is intrinsically stationary and weakly stationary.
98
Brownian Motion and Ornstein-Uhlenbeck processes
Historical remarks:
Robert Brown observes ceaseless irregular motion of small particles in a
fluid. Motion explained by believing particles to be alive (1827-1829).
Goul puts forward a kinetic theory to explain the motion; it is due to rapid
bombardment of a huge number of fluid molecules (1860).
Einstein presents the theory of “Brownian motion” (c. 1900). At the same
time (c. 1900), Bachelier defines it to model stock options.
The Ornstein-Uhlenbeck process was proposed by Uhlenbeck and
Ornstein (1930) in a physical modelling context, as an alternative to
Brownian Motion. The model has been used since in a wide variety of
applications areas e.g. in finance (see Vasicek (1977) interest rate
model).
99
Kriging formulation with the variogram I
In Kriging, we have
s(x0)− µ(x0)︸ ︷︷ ︸s(x0)
=
J∑j=1
λj (s(xj)− µ(xj))︸ ︷︷ ︸s(xj)
+ε(x0)
or
ε(x0) = s(x0)−J∑j=1
λj s(xj)
with the assumption E[ε(x0)] = 0.
100
Kriging formulation with the variogram II
The estimation is performed by:
(λ1, · · · , λJ) = arg min E[ε(x0)2]
solved
without constraint : Simple Kriging
with constraints : Universal/Ordinary Kriging
and the estimate is computed using the observations:
s0 =
J∑j=1
λj s(1)j︸ ︷︷ ︸
Ordinary/Universal Kriging
or s0 =
J∑j=1
λj s(1)j + µ (1−
J∑j=1
λj)︸ ︷︷ ︸Simple Kriging
101
Kriging formulation with the variogram III
Exercise: With the constraint∑Jj=1 λj = 1 (true in Universal/Ordinary
Kriging):
1 Show that :s0 − J∑j=1
λj sj
2
= −1
2
J∑j=1
J∑i=1
λi λj (si − sj)2 +
J∑j=1
λj (s0 − sj)2
2 Show that
E[ε(x0)2] = −J∑j=1
J∑i=1
λi λj γij + 2
J∑j=1
λj γ0j
3 Deduce what are the assumptions of stationarity (weak or intrinsic) that
will be used for Simple, Ordinary and Universal Kriging.
102
Kriging formulation with the variogram IV
Definition (Kriging variance)
The minimised mean-squared prediction error E[ε(x0)2] is sometimes called
the Kriging variance:
σ20 = 2
J∑j=1
λj γ0j −J∑j=1
J∑i=1
λi λj γij (Or./Un. Kriging)
or
σ20 = c00 − 2
J∑j=1
λj c0j +
J∑i=1
J∑j=1
λiλj cij (Simple Kriging)
so prediction interval can be constructed: s0 ± 2σ0.
103
Kriging formulation with the variogram V
Kriging Hypotheses µ(x) Contraints Prediction at x0 variance stationarity?
Simple -
Ordinary
Universal
104
Modelling variogram I
We have presented the Kriging approaches to perform prediction. These
approaches require the knowledge of the covariance function or variogram of
the stochastic process s(x) = s(x)− µ(x).
Definition (variogram cloud)
For any set of data we can compute the semivariance for each pair of points
xi and xj individually as
γ(xi, xj) = γij =1
2(s(xi)− s(xj))2
These values are plotted against the difference h = xi − xj between the sites.
When a variogram is isotropic, it depends only on the distance ‖h‖. A
scatterplot of this form is called the variogram cloud.
105
Modelling variogram II
Exercises:
1 How many points does the variogram cloud contain if the separation
distance h is not limited?
2 Consider x1 = (2, 3), x2 = (4, 4) and x3 = (4, 3) where we have measured
s1 = 2, s2 = 3 and s3 = 2.4. Assuming the variogram is isotropic, plot the
variogram cloud.
106
Modelling variogram III
Definition (Average semi variance)
The definition of the semivariance is
γ(h) =1
2E[(s(x)− s(x+ h))2]
which can be estimated by:
γ(h) =1
2m(h)
m(h)∑i=1
(s(xi)− s(xi + h))2
m(h) is the number of pairs of sites separated by the particular lag vector h.
The way it is computed depends on the sampling design.
107
Modelling variogram IV
Example: In labs, we explore the first tutorial given with the R-package gstat.
This tutorial looks at the Meuse data set that gives locations and topsoil heavy
metal concentrations, collected in a flood plain of the river Meuse, near the
village of Stein (NL). Heavy metal concentrations are from composite samples
of an area of approximately 15 m x 15 m.
108
Modelling variogram V
distance
sem
ivar
ianc
e
1
2
3
500 1000 1500
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●
● ●● ●
●
●
●
●
●● ●●●
●
●
●
●●●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●● ●●
●
● ●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●● ●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
● ●● ●
●●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●● ●●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●● ●●●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
● ● ● ● ●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●● ●● ● ●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●●●● ● ●●
●
●
●
●●●●
●
● ●●●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●●●●●
●●●●
●
●
●
●●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●●●●● ● ●●●●●
●
●
●
●●●●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●●
●
●
●
●
●●
●●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●●●●●
●●●
●
●●
●
●
●
●
●
●
● ●●●
●
● ●●●
●
●
●
●
●
●●
●
●
●
●●●●●
●
●
●●●
●●
●●
●
●
●
●
●
●●
●●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●● ●
●●●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●● ●●●● ●●
●
●
●
●
●
●●●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●● ●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●●●
● ●
●
●
●●
●
●
●
● ●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
● ●●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●
●● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●●●●●
●
●
●●●
●
●
●
●●
●●●●●●
●
●
●●●●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●●
●●
●
●
●
●
●
●●●
●●
●
●
●●●
●●
●
●
●
●
●
●
●●●●●●
●
●
●●●●●●●
●
●
●
●●●
●●
●
●
●●●
●●
●●●
●
●
●●●
●●
●
●
●●●
●● ●● ●●
●
●
●
●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●●
●
●
●● ● ● ●●
●●
●●●
●
●
●
●
●
●
●● ●●
●●
●●
●
●
●
●
●
●
●●
●
●
●●●
● ●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●●●
●●● ●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●●●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●● ● ●●
●●
●
●
●●●
●●
●●●
●●
●
●
●
●
●
●
●
●●●
●
● ●●●
●●
●
●
●●●
●● ●● ●●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●● ●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ● ●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●● ● ● ●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●● ● ● ●●
● ●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ● ●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ● ●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●● ● ● ●●●●● ●
●●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●● ● ●●
●
●
●●
●●
●
● ●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
● ● ● ●●
●
●
●●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
● ● ●●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ● ●● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ● ●●●●● ●●●● ●
●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●●● ● ●●●●● ●
●●
●
●●●
●●
●
●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●● ● ●●
●
●
●●
●●
●
●●●
●●
●●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●● ● ● ●●
●●
●●
● ●
●
●●●
●●
● ●●
●
●
●
●
●●●●
●
●
● ●● ● ●● ●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●●● ● ●●● ●●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ● ●●●
●
●●
●●
●
●● ●
●●
●●●●
●
●
●●
●
●
●
●
●●●● ● ●●●
●●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ● ●●
●●
●●
● ●
●
●● ●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●● ● ● ●●
●●
●●●●
●
● ●●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●● ● ●
●
●
●●
●●
●
●●●
●
●
●●
●● ●●
●
●
●●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ● ● ●
●
●
● ● ● ● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●
●●
●
●
●
●●
●
●
●
●
●●
●●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
● ●●●●●
●
●●●
●●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●● ●●
●
●
●●
●●
●
●● ●
●
●
●
●●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●●● ●●
●
●●
●●●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●● ● ●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●●●
● ●● ●
●●
●●●●
●●
●
● ●●
●
●
●
●
●●
●
●
●●
● ●
●●●
●●
●
●
●●
●
●
● ●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
● ●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ● ●● ●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●● ●●●● ●●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ● ●●●
●
●●
●●
●
●● ●
●●
●●●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●● ●● ●● ●● ● ● ● ●
●
●
●
●
●
● ● ● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●● ● ●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
● ● ●●●
●
●●
●● ●●
●
●
●
●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●
●● ●
●●●●
●●
●●● ●
●●
●
●● ●
●
●
●
●
●●
●● ●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
● ● ●●●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●● ●●● ● ●
●●
● ●
●
●
●●●
●●
●● ● ●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●●●
●
●●●
●
● ●
●
●
●
●
●
●
●●●●●
● ●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●●
●● ●
●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ● ●●●● ● ● ●●
●
●
●●●
●●
●●●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
●● ●
●●●
●●
● ● ●●
●
●
● ●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●● ● ● ● ●
●●
●
●● ●
●●●
●
●●●
●●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●●●●●
● ●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●●
●
●●●
●
● ●
●
●
●
●
●
●
●
●●●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●●●
● ●●●● ●
●
●
●
●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●●
●● ●●●● ●
●
●
●
●
●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●●●●●●
●
●
● ● ●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●●●● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
● ●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●●● ●●
●
●●
●●●
●●
●● ●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●● ● ● ● ● ●
●●●
●●
● ●●●
●
●
●●●
●●
●
●
● ● ●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●
●●●
●● ●
●
●
●
●
●
●
●
●●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●● ●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●
●● ● ●
●●●●
●●
● ●●●
●●
●
● ●
●
●●
●●●●
●
●
●● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
Figure: Variogram cloud of log(zinc) in the Meuse data set.
109
Modelling variogram VI
distance
sem
ivar
ianc
e
0.2
0.4
0.6
500 1000 1500
●
●
●
●
●
●●
●
●
●
●
●
●
●●
Figure: Average semiVariogram of log(zinc) in the Meuse data set.
110
Sampling designs I
Consider the region D ⊂ R2, the location {xj}j=1,··· ,J can be chosen by:
1 Systematic Sampling (e.g. grid)
2 Pure random sampling
3 Stratified sampling
111
Conclusion
We have seen how to estimate a variogram using the observation we
have for stochastic processes {s(xj)}j=1,··· ,J that are measured at each
site xj .
Can we estimate the variogram γ in this fashion when Universal Kriging
will be used to perform prediction at a new position x0?
What is the stochastic process for which the variogram is used in
Universal Kriging? Ordinary Kriging? Simple Kriging?
112
Variogram modelling for Universal Kriging I
If s(xi)− s(xj) = s(xi)− s(xj) for simple and ordinary Kriging, this is not
the case anymore for universal Krigind where E[s(x)] = µ(x).
We have collected observations for s but not for s and we dont know the
form of µ(x).
For instance, if we assume that s(x) = a x+ b (linear drift), then the
difference
s(x+ h)− s(x) = a h
so the estimated variogram (variogram plot) will be affected by this drift
and looks like a parabola:
γ(h) =1
2(a h)2
113
Variogram modelling for Universal Kriging II
1 The first method proposes to get a quick estimate of the mean to removeit:
1 Specify a linear model µ(x) =∑Pp=0 βpφp(x) on a chosen basis of functions
{φp}p=0,··· ,P .
2 Estimate {βp}p=0,··· ,P by Least Squares to give an estimate of the mean
µ(x) =∑Pp=0 βpφp(x).
3 Compute the variogram of s(x) = s(x)− µ(x).
114
Variogram modelling for Universal Kriging III
2 The other approach to estimate the variogram works as follow:
1 Choose a theoretic variogram function.
2 Specify a linear model µ(x) =∑Pp=0 βpφp(x) on a chosen basis of functions
{φp}p=0,··· ,P .
3 Estimate µ(x) (i.e. find {βp}p=0,··· ,P ) such that the residuals
s(x) = s(x)− µ(x) have a variogram that matches the chosen theoretical
one.
115
Variogram functions I
Several analytical functions have been proposed in the litterature to model a
variogram. They all follow a few basic rules.
1 Limits on variogram function.I Behaviour near the origin. The way in which the variogram approaches the
origin is determined by the continuity (or lack of continuity) of the variable
s(x) itself. The semivariance at |h| = 0 is by definition 0. It can happen
however that experimental values give a positive value (positive intercept,
nugget effect).
Near 0, the variogram can have a linear approach γ(h) ' b|h| or quadratic
γ(h) ' b|h|2 as |h| → 0.
116
Variogram functions II
I Behaviour towards infinity. The variogram is constrained by:
limγ(h)
|h|2 = 0 as |h| → ∞
If it does not then the process is not entirely random (and is not compatible
with the intrinsic hypothesis).
117
Variogram functions III
2 Examples of models:I Unbounded models.
γ(h) = ω |h|α for 0 < α < 2
α = 1 the the variogram is linear .I Bounded models. Based on experience, bounded variation is more common
than unbounded one.I Bounded linearI SphericalI ExponentialI ...
118
Variogram functions IV
Once we have computed the variogram plot, we select the best analytical
(theoretic) variogram function by:
1 fitting the models to the observed variogram plot.
2 choosing the theoretic model that has the smallest RSS (Residual sum of
square) and/or AIC (Akaike Information Criterion).
Exercise: Assuming that s is weakly stationary, explain why the variogram
cannot be unbounded.
119
Introduction Spectral Analysis I
We consider a time series where observations have been collected at regular
interval of time. One of the earliest general types of models for time series
was that they were a combination of
1 a trend describing a long term behaviour of the series (deterministic)
2 a periodic or seasonal component describing variation within a cycle
(deterministic)
3 an error term describing random fluctuations of the series about the
trend and seasonal components (stochastic).
(s(tj) = µ(tj) + ε(tj)) ≡ (sj = µj + εj)
where µ captures the deterministic behaviour (while both s and ε are
stochastic).
120
Introduction Spectral Analysis IIThe purpose of the trend term, is to describe the long-term features of the
series (e.g an overall tendency), it would usually correspond to some smooth
slowly changing function.
By differencing the time series the trend can easily be removed. So now we
assume that we have only a remaining seasonal pattern.
For certain series we can specify the periodicity of the seasonal component
very accurately (e.g. in the case of economic or geophysical series which
contain a strict 12-month cycle).
However in other series it may not be possible to specify the period of the
cyclical term a priori or there may be several (non harmonically related)
periodic terms present with unknown periods.
121
Introduction Spectral Analysis III
We need special techniques to detect these hidden periodicities. We can
represent the seasonal component by a sum of sine and cosine terms (well
suited explanatory variables to capture periodic information):
sj =∑p
(Ap cos (ωp j) +Bp sin (ωp j)) + εj
How to estimate {(Ap, Bp, ωp)} ?
122
Periodogram I
Definition (Periodogram)
Given a time series s1, s2, · · · , sJ , the periodogram is defined by:
I(ω) =2
J
J∑j=1
sj cos(ω × j)
2
+
J∑j=1
sj sin(ω × j)
2
The function I(ω) is then computed for ωp = 2πpJ with p = 0, 1, · · · , J2 .
The periodogram is also sometimes represented w.r.t. the frequency f = ω2π .
This implies f ∈{
0, 1J ,
2J , · · · ,
12
}.
123
Periodogram II
Example (Pure sine time series)
Consider a noiseless time series:
sj = cos
(2× π × 5
J× j), j = 1, · · · , J = 100
1 Compute numerically ω = 2π×5J .
2 Compute numerically the corresponding frequency f = ω2π
3 Look at the periodogram and conclude.
124
Periodogram III
Figure: Pure sine time series sj = cos(2×π×5J× j), j = 1, · · · , J = 100.
125
Periodogram IV
Figure: Periodogram I(f).
126
Periodogram V
Example (Combination of pure sine time series)
Consider the time series:
sj = 2 cos
(2π × 5
J× j)
+ sin
(2π × 10
J× j), j = 1, · · · , J = 100
1 Compute numerically ω1 = 2π×5J and ω2 = 2π×10
J .
2 Compute numerically the corresponding frequencies f1 and f2.
3 Look at the periodogram and conclude.
127
Periodogram VI
Figure: Combination of two pure sine time series.
128
Periodogram VII
Figure: Periodogram I(f).
129
Periodogram VIII
For a time series with a cyclical patterns:
sj =∑p
(Ap cos (ωp j) +Bp sin (ωp j)) + εj , j = 1, · · · , J
By computing the periodogram, you can identify the most prevalent
frequencies {ωp}.
How would you propose to estimate the corresponding coefficients
{Ap, Bp} ?
130
Periodogram IX
Lets define the complex vector:
Vp =
exp (−ωp 1)
exp (−ωp 2)...
exp (−ωp J)
=
cos (−ωp 1) + sin (−ωp 1)
cos (−ωp 2) + sin (−ωp 2)...
cos (−ωp J) + sin (−ωp J)
you can check that the periodogram I(ωp) (with ~s = [s1, · · · , sJ ]T ):�� ��I(ωp) ∝ ‖ < ~s|Vp > ‖2
with < ·|· > the dot product and ‖ · ‖ the euclidian distance.
131
Periodogram X
Definition (Inner product)
For complex vectors x and y of length J , the inner product is:
< x,y >=
J∑n=1
xn yn
where yn is the complex conjugate of yn.
You can check also that (ωp = 2πpJ ) :
< Vp1 |Vp2 >= J · δp1p2 = J ×
{0 if p1 6= p2
1 otherwise
(δp1p2 indicates the Kronecker delta).
132
Periodogram XI
The set of vectors {Vp}p=0,··· ,J−1 forms a basis of the space CJ . So any
~s ∈ CJ can be written exactly (note V0 = VJ )
~s =
J−1∑p=0
θpVp or ~s =
J∑p=1
θpVp
Because here s is a time series of real numbers, then it can be shown
that
θp = θJ−p (complex conjugate)
so only half of the coefficients {θp}p=0,··· ,J/2 are needed.
133
Periodogram XII
Definition (Discrete Fourier Transform and Power spectrum)
The discrete Fourier Transform (DFT) of the time series s1, · · · , sJ is defined:
θp =< ~s|Vp >=
J∑j=1
sj exp (−ıωpj)
with ωp = 2πpJ . {θp}p=0,··· ,J−1 are the Fourier coefficients of the time series s
and:
~s =
J∑p=0
θpVp
The power spectrum is defined as P (ωp) = θ2p and the periodogram is
therefore proportional to the power spectral density of a signal.
134
Periodogram XIII
Reading and using a periodogram
High values of the periodogram indicate periodic patterns.
Detect the main maximum (or maxima) and identify the corresponding
frequency(ies) f .
The period of the cyclical pattern is recovered by computing 1/f . If the
time series have been recorded every months, then the period is 1/f
months.
135
Case study: Variable Star Brightness I
The brightness or the magnitude of a particular star at midnight is
observed on 600 consecutive nights
Day
Brig
hnes
s
0 100 200 300 400 500 600
05
1015
2025
3035
136
Case study: Variable Star Brightness II
0.0 0.1 0.2 0.3 0.4 0.5
050
0010
000
1500
020
000
Frequency
Var
iabl
e S
tar
per
iodo
gram
Two prominent peaks appear for f1 = 0.035 and f2 = 0.04167 .
137
Case study: Variable Star Brightness III
Exercises:
1 What are the periods in days associated with f1 and f2 ?
2 What model would you suggest to fit to this time series?
138
Case study: Variable Star Brightness IV
Answers:
1 f1 corresponds to 29 day period and f2 corresponds to 24 day period.
2 model:
st = β0+β1 cos(2πf1t)+β2 sin(2πf1t)+β3 cos(2πf2t)+β4 sin(2πf2t)+εt
139
Conclusion
Remember that spectral analysis by discrete fourier transform can be
applied when observations have been collected at regular interval.
Frequency domain analysis or spectral analysis has been found to beuseful in (for instance):
I AcousticI communication engineeringI geophysical scienceI biomedical science
140
Fourier transform I
The Fourier transform is an important tool to analyse signals, discrete or
continuous.
A Wavelet tour of signal processing , by S. Mallat, second edition
Definition (Fourier transform)
The Fourier integral measures how much oscillations at the frequency ω there
is the function f :
F (ω) =
∫ ∞−∞
f(t) exp(−ıωt) dt
141
Fourier transform II
Definition (Inverse Fourier transform)
The Inverse Fourier transform can be computed:
f(t) =1√2π
∫ ∞−∞
F (ω) exp(ıωt) dω
Exercises: Fourier transformCompute the Fourier transforms of:
1
f(t) =
{1 − T < t < T
0 otherwise
2 f(t) = δ(t)
3 f(t) = δ(t− t0)
142
Fourier transform III
4 f(t) = exp(−at) h(t) (h is the heaviside step function)
5 f(t) = cos(ω0t)
6 Show
1 Linearity:
a x(t) + b y(t)↔ a X(ω) + b Y (ω)
2 Time shifting:
x(t− t0)↔ exp(−ıωt0)X(ω)
143
Fourier transform IV
Definition (Convolution)
The convolution of two functions x(t) and h(t) gives a function y(t) such that:
y(t) = x(t) ∗ h(t) =
∫ +∞
−∞x(τ) h(t− τ) dτ
Theorem (Convolution)
If y(t) = x(t) ∗ h(t) then:
Y (ω) = X(ω) ·H(ω)
Prove this theorem.
144
Two-dimensional Fourier transform I
Definition (2D Fourier Transform)
F (ωx, ωy) =
∫ ∞−∞
∫ ∞−∞
f(x, y) exp(−ıωxx) exp(−ıωyy) dx dy
Remember that Spectral analysis is well suited for signal that is observed at
regular interval. In particular, 2D Fourier Transform is suitable to analyse
images:
Definition (image)
An image can be understood as a stochastic process s(x) that has been
observed at J locations {xj}j=1,··· ,J lying on a regular 2D grid. For grey level
images, the state space of s is all integers between 0 (black) and 255 (white).
145
Two-dimensional Fourier transform II
Low pass filtering
146
Two-dimensional Fourier transform III
High pass filtering
147
Functional Data Analysis: Introduction I
Definition (functional data)
Functional data indicate that the collected (observed ) data are observations
of functions varying over a continuum.
Functional Data Analysis , J.O. Ramsay and B. W. Silverman, Springer 2006.
148
Functional Data Analysis: Introduction II
Functional Data Analysis (FDA) aims at:
representing data in ways that help further analysis.
to display the data so as to highlight various characteristics
studying important sources of patterns and variations among the data
149
Functional Data Analysis: Introduction IIIExample: Height of Girls
150
Functional Data Analysis: Introduction IV
10 subjects took part in the study.
each record has 31 discrete values of height that are not equally spaced.
there are uncertainty or noise in the collected height values (about 3mm).
the underlying process can be assumed to be a smooth function µj(t) for
each subject j taking part in the study and observations have been
collected for the stochastic process
sj(t) = µj(t) + εj(t)
151
Functional Data Analysis: Introduction V
For each record j, it is interesting to calculate an estimate of the function
µj(t) (Linear Smoothing methods).
When having a family of functions {µj(t)}j=1,··· ,J , it is interesting to
investigate the variations of this family of functions (functional Principal
Component Analysis).
For simplicity here, we are considering temporal stochastic processes. All
methods can easily be extended to spatio-temporal stochastic processes.
152
Linear Smoothing: Introduction I
We consider that we have observations {(t(i), s(i))}i=1,··· ,n
Definition (Linear smoothing)
A linear smoother estimates the function value s(t) by a linear combination of
the discrete observations:
s(t) =
n∑i=1
λi(t) s(i) =< ~λ(t)|~s >
The behaviour of the smoother at t is controlled by the weights λi(t).
153
Linear Smoothing: Introduction II
Example of linear smoothing methods:
1 Kriging X
2 Regression on basis of functions
3 Nadaraya-Watson estimator
Note that these methods have different hypotheses concerning the nature of
the noise ε.
154
Linear Smoothing: Regression on basis of functions I
A model can be proposed for the deterministic part of the process as
follow:
µ(t) =
K∑k=0
θk φk(t) =< ~φ(t)|~θ >
The standard model for the error ε is to be N (0, σ2) and independent on
time (or E[s(t)] = µ(t) and Var[s(t)] = σ2). Observations collected are
{(t(i), s(i))}i=1,··· ,n
155
Linear Smoothing: Regression on basis of functions II
The coefficients {θk}k=0,··· ,K are estimated using least squares:
~θ = (ΦTΦ)−1ΦT ~s
where ~s = [s(i), s(2), · · · , s(n)]T , Φ is the n× (K + 1) matrix collecting the
values {φk(t(i))}. So the estimate for E(s(t)) is:
s(t) =< ~φ(t)|~θ >= ~φ(t)T (ΦTΦ)−1ΦT︸ ︷︷ ︸~S(t)T
~s
Exercise: What are the basis functions {φk}k=0,··· ,K that can be used?
156
Linear Smoothing: Regression on basis of functions III
Example: Vancouver precipitation data with 13 Fourier Bases.157
Linear Smoothing: Regression on basis of functions IV
Choice of Basis Expansions.
When performing least squares fitting of basis expansions φ0(t), φ1(t), · · · is a
basis system for s. The choice of this basis system is important in particular if
you want to explore time derivatives of s(t).
158
Linear Smoothing: Regression on basis of functions V
Monomial Basis φ0(t) = 1, φ1(t) = t, φ2(t) = t2, · · · , φK(t) = tK
Properties
I difficult for K > 6
I Derivatives of φk(t) get simpler but this is often not a desirable property
when fitting real-world data.
159
Linear Smoothing: Regression on basis of functions VI
K = 0 K = 1
K = 2 K = 3 160
Linear Smoothing: Regression on basis of functions
VII
Fourier Basis{1, sin(ωt), cos(ωt), sin(2ωt), cos(2ωt), sin(3ωt), cos(3ωt), · · · , sin(Kωt), cos(Kωt)}Properties
I natural basis for periodic dataI Derivatives retain complexities.
161
Linear Smoothing: Regression on basis of functions
VIII
K = 3 K = 4
K = 5 K = 6 162
Linear Smoothing: Regression on basis of functions IX
Splines. Splines are polynomial segments joined end-to-end. They are
constrained to be smooth at the joints (called knots). The order (order =
degree+1) of the polynomial segments and the location of the knots
define the system.
163
Linear Smoothing: Regression on basis of functions X
K = 1 K = 2
K = 3 K = 4
164
Linear Smoothing: Regression on basis of functions XI
Other basis:
wavelet
...
How to choose the number K of functions?
Information Criterion: AIC, BIC.
Cross-Validation
165
Linear Smoothing: Nadaraya-Watson estimator I
The model is s(t) = µ(t) + ε(t) with E(ε(t)) = 0.
The estimate s(t) is computed as an expectation of s at time t:
E[s|t] = E[s(t)] = E[µ(t) + ε(t)] = µ(t)
By definition of expectation:
E[s|t] =
∫s ps|t(s|t) ds =
∫s pst(s, t) ds
pt(t)
Note how Bayes formula is used.
166
Linear Smoothing: Nadaraya-Watson estimator II
Since we dont know the true densities pst(s, t) and pt(t), the idea of the
Nadaraya-Watson estimator is to replace them by density estimates such
that the integral is easy to compute and the solution provides a smooth
estimate of s(t).
Consequently the Nadaraya-Watson estimator can be written:
E[s(t)] '∫s pst(s, t) ds
pt(t)=
∑ni=1
1hk(t−t(i)h
)s(i)∑n
i=11hk(t−t(i)h
) =
n∑i=1
Si(t) s(i) = s(t)
167
Linear Smoothing: Nadaraya-Watson estimator III
Choosing the kernel k as the Dirac function is equivalent as using the
empirical density estimates for pst(s, t) and pt(t).
The resulting estimate for s(t) would not be smooth with the Dirac kernel
and other kernels can be used (e.g. Gaussian, uniform, quadratic,
triangle, Epanechnikov, etc.).
168
Linear Smoothing: Nadaraya-Watson estimator IV
Definition (Kernel Density Estimate)
Consider a random variable x for which observations {x(i)}i=1,··· ,n have been
collected. Choosing a even function k such that∫k(x) dx = 1 and k(x) ≥ 0 ∀x
the a Kernel Density Estimate (KDE) for the density function px is:
px(x) =1
nh
n∑i=1
k
(x− x(i)
h
)h is called the bandwidth and is a parameter controlling the level of smoothing.
169
Linear Smoothing: Conclusion and Extensions I
Conclusion:
Nadaraya Watson estimator: the bandwidth needs to be
chosen/estimated.
Regression on basis of functions: {θk} needs to be estimated, the basis
{φk(t)} needs to be chosen, and the number K of functions needs to be
selected.
170
Linear Smoothing: Conclusion and Extensions II
Extension: Regularised basis approach
When fitting the model µ(t) =∑Kk=1 θk φk(t), one may add a smoothness
constraint to ensure that the estimated function is smooth.
If the basis of functions is differentiable then D2µ(t) can be computed
and a constraint (prior) can be added to the cost function C to estimate
the coefficients:
C(θ1, · · · , θK) =
n∑i=1
(s(i) − µ(t(i)))2 + λ
∫[D2µ(t)]2 dt
171
Linear Smoothing: Conclusion and Extensions III
Another example, consider growth, exponential growth or decay (e.g.
population, radioactive decay, economic indicators), we know that such
system follows an ODE:
Lx = −γx+Dx = 0
then fitting the model µ(t) =∑Kk=1 θk φk(t) can be constrained with
C(θ1, · · · , θK) =
n∑j=1
(s(i) − µ(t(i))2 + λ
∫[Lµ(t)]2 dt
where γ is unknown it can be estimated.
172
Linear Smoothing: Conclusion and Extensions IV
Depending of the problem, derivatives can be used for regularisation,
adding prior for the type of solutions we are looking for (e.g. smoothness,
or subject to a known ODE).
Derivatices can also be the subject of interest in the study i.e. For
instance in mechanics, if you collect spatial positions over time
(trajectories), one may be interested in studying the velocity (first
derivative) or acceleration (second derivatives).
173
Linear Smoothing: Conclusion and Extensions VExample: Height of Girls
174
Linear Smoothing: Conclusion and Extensions VI
175
Functional PCA I
Definition (Mean and variance functions)
Consider the functions {µj(t)}j=1,··· ,J that are observed instances of the
random function s(t), the mean function is the average of the functions:
E[s(t)] = µ(t) =1
J
J∑j=1
µj(t)
and the variance function is:
Var[s(t)] =1
J − 1
J∑j=1
(µj(t)− µ(t))2
176
Functional PCA II
Example (Height of Girls)
Height(t) =1
10
10∑i=1
Heighti(t)
VarHeight(t) =1
10− 1
10∑i=1
(Heighti(t)−Height(t))2
177
Functional PCA III
FPCA, Principal Component Analysis for functions:
We compute the mean µ(t) and substract it to each curve:
µj(t) = µj(t)− µ(t)
Let v(t1, t2) = E[µ(t1) µ(t2)] be the sample covariance function, it can be
estimated by:
v(t1, t2) =1
J
J∑j=1
µj(t1) µj(t2)
An eigencurve u(t) is then computed such that:
∀t1,∫v(t1, t) u(t) dt = λ u(t1) subject to
∫u(t)2dt = 1
178
Functional PCA IVA simple approach is to discretise the functions µj(t) to a fine grid of N
sites equally spaced on the time line. This allows to treat these functions
as finite dimensional vectors and standard PCA can be applied to
estimate the eigenvectors.
The continuous principal component u(t) is recovered by interpolating the
discrete eigenvector.
The choice of interpolation method does not matter when recovering the
eigencurve from the eigenvector when the space between the N
sampling sites is small.
This approach is the earliest approach to functional principal component
analysis.
179
Transforms I
Definition (Integral transform)
An integral transform T is defined as:
T [f(t)] ≡ F (u) =
∫ t2
t1
K(t, u) f(t) dt
where K is the kernel function. This can also be understood as a linear
combination (’continuous sum’) over a basis of functions K.
Example
The Laplace transform is an example of integral transform with
K(t, u) = exp(−u t).
The Fourier transform is another example with K(t, u) = exp(−u t).
180
Transforms II
Another useful example is the Radon transform:
Definition (Radon transform)
Having a function f defined on a domain x = (x, y) ∈ R2, the Radon transform
of f is its integral along the line of equation ρ− x cos θ − y sin θ = 0 i.e.:
Rf(ρ, θ) =∫R∫R δ(ρ− x cos θ − y sin θ) f(x, y) dx dy
=∫R2 δ(ρ− xTn) f(x) dx
with δ(·) the Dirac delta function, ρ ∈ R, θ ∈ [−π2 ; π2 ] and n = (cos θ, sin θ)T .
181
Transforms III
x
x
y ρ− xTn = 0
n
ρ
θ
S
182
Transforms IV
Exercises:
1 Assuming that f(x) = px(x) is the probability density function of x, what
is Rf(ρ, θ) when θ = 0 and θ = π2 ?
2 Assuming that f(x) = px(x) is the probability density function of x, is
Rf(ρ, θ) probability density function?
183
Application Radon Transform I
1 Consider the r.v. x = (x, y) for which only one observation x(1) is
available. Then the empirical density function is:
px(x) = δ(x− x(1)) = δ(x− x(1)) δ(y − y(1))
What is the Radon transform of px(x)?
2 With two observations
px(x) =1
2
[δ(x− x(1)) + δ(x− x(2))
]What is the Radon transform of px(x)?
184
Application Radon Transform II
x
y
20 40 60 80 100
10
20
30
40
50
60
70
80
90
100
185
Application Radon Transform III
θ (degrees)
ρ
0 20 40 60 80 100 120 140 160 180
−60
−40
−20
0
20
40
60
186
Application Radon Transform IV
x
y
10 20 30 40 50 60 70 80 90 100
10
20
30
40
50
60
70
80
90
100
187
Application Radon Transform V
θ (degrees)
ρ
0 20 40 60 80 100 120 140 160 180
−60
−40
−20
0
20
40
60
Conclude on how the Radon transform can be used to detect perfect
lineament.
188
Application Radon Transform VI
Remarks:
The Hough transform is a very closely related method to the Radon
transform to find shapes (line, circles, etc.).
The formulation of the Radon tranform itself has been generalised to
consider quadratic shapes for instance:The general quadratic Radon transform, Koen Denecker, Jeroen Van Overloop and FrankSommen, Inverse Problems 14 (1998) 615-633
189
Application Radon Transform VIIExample: Archeology
Detecting Roman land boundaries in aerial photographs using Radon transforms, D.J. Bescoby,Journal of Archaeological Science Volume 33, Issue 5, May 2006, Pages 735-743
190
Application Radon Transform VIII
Example: crater detection on Mars -
Method for Crater Detection From Martian Digital Topography Data Using GradientValue/Orientation, Morphometry, Vote Analysis, Slip Tuning, and Calibration, Goran Salamuniccarand Sven Loncaric, IEEE transactions on Geoscience and remote sensing, vol 48, n. 5, May 2010
191