Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1MaPhySto Workshop 9/04
Nonlinear Time Series Modeling Nonlinear Time Series Modeling Part III: Nonlinear and Part III: Nonlinear and NonGaussianNonGaussian StateState--Space Models Space Models
Richard A. DavisColorado State University
(http://www.stat.colostate.edu/~rdavis/lectures)
MaPhySto Workshop
Copenhagen
September 27 — 30, 2004
2MaPhySto Workshop 9/04
Part III: Nonlinear and NonGaussian State-Space Models1. Introduction
1.1 Motivating examples1.2 Linear state-space models
SetupRandom walk + noiseLocal linear trend + seasonalityKalman filtering and smoothingOther applications (missing values)
1.3 Generalized state-space models2. Observation-driven models
2.1 GLARMA models for time series of countsProperties
Existence and uniqueness of stationary distributions
Maximum likelihood estimation and asymptotic theory
Application to asthma data2.2 GLARMA extensions
Bernoulli2.3 Other (BIN)
3MaPhySto Workshop 9/04
3. Parameter-driven models3.1 Estimation
GLM
Importance sampling
Approximation to the likelihood
3.2 Simulation and Application
Time series of counts
Stochastic volatility
3.3 How good is the posterior approximation?
Posterior mode vs posterior mean
3.4 Application to estimating structural breaks
Poisson model
Stochastic volatility model
4MaPhySto Workshop 9/04
1.1 Motivating Examples: Daily Asthma Presentations (1990:1993)
Year 1990
••••••••••
•••••
•••••••••••••••••
•••••
•
••••••
••••••••••••
••••
•••••••••••
••••••••
•••••••••
•••
••
••••
••••••
•••••
••
•
•••••••••••••••••••••
••
•
•••••
•
••••
•••••••
••••
•
••••••••••••
•
••••••
•••••
•
•••
•
••••••
•••••••••
•
•••••••
•
•••••••••••••
••••••••••
••••••••••
••••••••••••
••••••
•••••
••••••••
••••••••••••
•••••••
•••••••••••
••••••••••••
•
•••••••••••••
•••••
••••••••••••
••••••••
••••••
••••
06
14
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Year 1991
••••••
••••••••••••
•••••••••
••••••••
•••••
••
•••••
••••••
••••••
••••••
•••••••••
••
•••
••••••
•
•••••••
•
••••••
••••••••
••••••••••••••
•••••
•••••
•
•
•
•••
•••••••••
••••••••
••••••••
••••••••••••
•
•••••••
••••
••••••
•••••••••••
•••••
•••••••
•
••••••••••••
••••
••••••••••
•••••••••••••••••••••••••••
••••••••
•••••
•••••
••••••••••
•••••••
•••••••••••••
•••••
••••••••••
•••••••••••••
••••••••••
•••••••••
06
14
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Year 1992
•
•••••••••
••••••••••••••••••••
••••••••
•
••••••••••
••
•••••
••
••
••••••••••
••
•
••
•••••
•••••
••••••
•••••••
•••
••••
••••••••••
•••
•••••••••
•••••••••
••••••
••
•
••
•
•
••
•
•••••••
•
•••••••
••••
•
•••••
••••••
•••••••••
••••••••••••
••••••
•••••••••
••••
••••••••
•••
••••••••
••••••••
•••••••••••••
•••••
••••••
••••••••••
••••••••
•••••••
•••••••••
•••••••••
•
•••••••
•
••••••••
••••••••••••
••••••••
•••••
•••••••0
614
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Year 1993
••••••••••••
••••••••
•••••
•••••
•••••••••••
•••
••••
••
••••
••••
•
•
•••••••
•
•••••••••••
••
•••••••••
••••
••
••
••••••••
•••••••••
••••••••
•••••••••••
•
•••••••••••
••••
••••••
••
••••••
••
•••••
•••••••••••
•••••••••
•••••••
••••
••••••••••••
•••••••
•••••
••••••••
••••••
•••••••
•••••
•••••••
•••••
•••••
•••••
•••••••••
•
••••••••••••••••••••
••••
••••
•••••••
•••••••••••••••••••••
••••
•••••••••
••••••••••••0
614
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
5MaPhySto Workshop 9/04
day
log
retu
rns
(exc
hang
e ra
tes)
0 200 400 600 800
-20
24
lag
AC
F
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag
AC
F of
squ
ares
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
lag
AC
F of
abs
val
ues
0 10 20 30 40
0.0
0.2
0.4
0.6
0.8
1.0
Example: Pound-Dollar Exchange Rates (Oct 1, 1981 – Jun 28, 1985; Koopman website)
MaPhySto Workshop 9/04
1.2 Linear State-Space Models: Setup
Observation equation:
(1) Yt = Gt Xt + Wt , Wt ~ WN(0, Rt )
State equation:
(2) Xt+1 = Ft Xt + Vt , Vt ~ WN(0, Qt )
(Wt and Vt are uncorrelated.)
Def: Yt has a state-space representation if there exists a state-space
model for Yt given by (1) and (2).
MaPhySto Workshop 9/04
Example (ARMA(1,1)):
Suppose Yt follows the recursions
Yt = φ Yt-1 + Zt + θ Zt-1 , Zt~WN(0,σ2).
Let Xt be the AR(1) process defined by
Xt = φ Xt-1 + Zt
so that
Setting Xt+1=[Xt-1 , Xt]’, we obtain the state-space representation
(state equation)
(observation equation)
⎥⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡φ
=⎥⎦
⎤⎢⎣
⎡
t1-t
2-t
t
1-t
Z0
XX
010
XX
⎥⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡φ
=⎥⎦
⎤⎢⎣
⎡=+
tt
t
1-t1t Z
00
10XX
XX
[ ] tt 1Y Xθ=
MaPhySto Workshop 9/04
ARMA(1,1) cont (canonical observable representation):
Suppose Yt follows the recursions
Yt = φ Yt-1 + Zt + θ Zt-1 , Zt~WN(0,σ2).
and again let Xt be the AR(1) process defined by
Xt+1 = φ Xt + (φ + θ) Zt (state eqn)
Then
Yt = Xt+ Zt (observation eqn)
To see this, note
Yt − φ Yt-1 = Xt− φ Xt-1 + Zt − φ Zt-1
= (φ + θ) Zt-1 + Zt − φ Zt-1
= Zt + θ Zt-1
(Unlike the previous representation this state equation has dimension 1.)
MaPhySto Workshop 9/04
Random walk plus noise
The observed process is modeled as
Yt = Xt + Wt , Wt~WN(0,σw2),
where the local level Xt, follows the random walk
Xt+1 = Xt + Vt , Vt~WN(0,σv2).
Note that the differenced data follows a MA(1) process, i.e.,
Dt = Yt − Yt-1 = Xt − Xt-1 + Wt − Wt-1 = Vt-1 + Wt − Wt-1
has lagged 1-correlation given by
21
2)1( 22
2
-wv
w =σ+σ
σ−=ρ if and only if σv
2 = 0.
This latter condition is equivalent to the signal being constant.
MaPhySto Workshop 9/04
Random walk plus noise (cont)
The differenced process Dt can be expressed as the MA(1) process
Dt = Zt + θ Zt-1 , Zt~WN(0,σ2).
Matching up lag 1-correlations, we have
So that the signal is constant if and only if θ = -1. This is referred to as the
unit root problem.
Prediction. The best linear predictor of Yt+1 in terms of Y1, . . . , Yt, is given by
222
2
1
2)1(
θ+θ
=σ+σ
σ−=ρ
wv
w
.ˆ)1()ˆ)(1( ˆttt1 tttttt YaYaYYaYY +−=−−+=+
For large t, at converges to θ +1. The one-step predictor is given byexponential smoothing, which is known to be optimal for ARIMA(0,1,1) models!
MaPhySto Workshop 9/04
Local linear trend plus seasonality model
This model takes the form
Yt = Mt + St + Wt
trend seasonal noise
where
Mt = Mt-1 + Bt-1 + Vt1 ,
Bt = Bt-1 + Vt2 ,
St = −St-1 − −St-11 + Vt3
State variable Xt=[Mt, Bt , St , . . ., St-10 ]’,
Xt+1=F Xt + Vt
Yt = [1 0 1 0, ] Xt + Wt
L
local linear trend
seasonal component w/ period=12)
L
MaPhySto Workshop 9/04
Kalman Filter
Recursive method for producing:
• predictors of the signal P( Xt+1 | Y1, . . . , Yt).
• predictors of the series P( Yt+1 | Y1, . . . , Yt).
• filtered values of the signal P( Xt+1 | Y1, . . . , Yt+1).
• Gaussian likelihood of (Y1, . . . , Yn ).
Remarks:
1. Excellent method for calculating best predictors of missing
observations.
2. Can handle computation of Gaussian likelihood with missing
observations.
13MaPhySto Workshop 9/04
1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961
100
200
300
400
500
600
(thou
sand
s)
1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961
100
200
300
400
500
600
(thou
sand
s)
one-step predictorsactual data
Example (airpass data):
14MaPhySto Workshop 9/04
1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961
010
020
030
040
050
0
(thou
sand
s)
local levelslopeseasonal
Predicted state components.
15MaPhySto Workshop 9/04
1992 power usage
0 50 100 150
020
0060
0010
000
1993 power usage
Time
0 50 100 150
020
0060
0010
000
1994 power usage
Time
0 50 100 150
020
0060
0010
000
1995 power usage
Time
0 50 100 150
020
0060
0010
000
Example (power usage): (with Sarah Streett)
16MaPhySto Workshop 9/04
Goal: Forecast maximum power demand 1-2 weeks in advance.
Legendre polynomials: orthogonal polyn wrt dx on [0,1]
f0(x) = 1, f1(x) = 2x-1, f2(x) = 6x2-6x+1, . . .
Tentative regression model:
Yt = β0 p0(t) + β1 p1(t) + β2 p2(t) + β3 p3(t) + β4 p5(t) + β5 p7(t)
+ β6 p9(t) + εt , t=1, 2, …, 183,
where
State-space formulation:
Yt = pt βt + Wt , observation equation
βt = βt-1 + Vt . state equation
(t/183)f(t/183)f
(t)pj
jj =
17MaPhySto Workshop 9/04
Polynomial fits (using asymmetric loss fcn).
1992 Legendre polynomial fit
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0060
0010
000
1993 Legendre polynomial fit
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0060
0010
000
1994 Legendre polynomial fit
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0060
0010
000
1995 Legendre polynomial fit
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0060
0010
000
18MaPhySto Workshop 9/04
The model:
Example: power usage (cont)
19MaPhySto Workshop 9/04
The model:
Example: power usage (cont)
Also, the variance is modeled as
where R(t) is the amount of rainfall on day t. If there is a large rainfall on a given day, then only small changes in the coefficients are allowed for the next several days.
20MaPhySto Workshop 9/04
Kalman recursions and prediction:Example: power usage (cont)
21MaPhySto Workshop 9/04
Robust Kalman filter (Cipra and Romera 1991):Example: power usage (cont)
22MaPhySto Workshop 9/04
One-step predictions using robustified KF filter:
1992 one-step predictions
Time
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0040
0060
0080
0010
000
1200
0
23MaPhySto Workshop 9/04
1993 one-step predictions
Time
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0040
0060
0080
0010
000
1200
0
One-step predictions using robustified KF filter:
24MaPhySto Workshop 9/04
One-step predictions using robustified KF filter:
1994 one-step predictions
Time
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0040
0060
0080
0010
000
1200
0
25MaPhySto Workshop 9/04
One-step predictions using robustified KF filter:
1995 one-step predictions
Time
o: Y
(t) *
: Yha
t(t)
0 50 100 150
020
0040
0060
0080
0010
000
1200
0
26MaPhySto Workshop 9/04
1.3 Generalized State-Space Models
Observations: y(t) = (y1, . . ., yt )
States: α(t) = (α1, . . ., αt )
Observation equation:
p(yt | αt ):= p(yt | αt , α(t-1), y(t-1) )
State equation:-observation driven
p(αt+1 | y(t) ):= p(αt+1 | αt , α(t-1), y(t) )
-parameter driven
p(αt+1 | αt ):= p(αt+1 | αt , α(t-1), y(t) )
27MaPhySto Workshop 9/04
Example for Time Series of Counts
Count data: Y1, . . . , Yn
Regression (explanatory) vector: xt
Model: Distribution of the Yt given xt and a stochastic process αt are indep Poisson distributed with mean
µt = exp(xtT β + αt).
The distribution of the stochastic process αt may depend on a vector of
parameters γ.
Note: αt = 0 corresponds to standard Poisson regression model.
Primary objective: Inference about β.
28MaPhySto Workshop 9/04
Parameter-Driven Specification
Parameter-driven specification: (Assume Yt | µt is Poisson(µt ))
log µt = xtT β + αt ,
where αt is a stationary Gaussian process. e.g. (AR(1) process)
(αt + σ2/2) = φ(αt-1 + σ2/2) + εt , εt ~IID N(0, σ2(1-φ2)).
Advantages:• properties of model (ergodicity and mixing) easy to derive.• interpretability of regression parameters
E(Yt ) = exp(xtT β )Εexp(αt) = exp(xt
T β ), if Ε[exp(αt)] = 1.Disadvantages:
• estimation is difficult-likelihood function not easily calculated (MCEM, importance sampling, estimating eqns).
• model building can be laborious• prediction is more difficult.
29MaPhySto Workshop 9/04
Observation-Driven Specification
Observation-driven specification: (Assume Yt | µt is Poisson(µt ))
log µt = xtT β + αt ,
where νt is a function of past observations Ys , s < t. e.g. αt = γ1Yt-1 + … + γpYt-p
Advantages:• likelihood easy to calculate• prediction is straightforward (at least one lead-time ahead).
Disadvantages:
• stability behavior, such as stationarity and ergodicty, is difficult to derive.• xt
T β is not easily interpretable. In the special case above,
E(Yt ) = exp(xtT β )Εexp(γ1Yt-1 + … + γpYt-p )
30MaPhySto Workshop 9/04
Financial Time Series Example-cont
A parameter-driven model for financial data (Taylor `86):Model:
Yt = σt Zt , Zt~IID N(0,1)
αt = γ + φαt-1 + εt , εt~IID N(0,σ2),
where αt = 2 log σt .
The resulting observation and state transition densities are
p(yt| αt) = n(yt ; 0, exp(αt ))
p(αt+1 | αt) = n(αt+1 ; γ + φαt , σ2 )
Properties:
• Martingale difference sequence.
• Stationary.
• Strongly mixing at a geometric rate.
• Estimation can be difficult-cannot calculate likelihood in closed form.
31MaPhySto Workshop 9/04
Financial Time Series Example
An observation driven model for financial data:Model (GARCH(p,q)):
Yt = σt Zt , Zt~IID N(0,1)
Special case (ARCH(1)=GARCH(1,0)): The resulting observation and state transition density/equations are
p(yt| σt) = n(yt ; 0, σ2t )
2211
22110
2qtqtptptt YY −−−− σβ++σβ+α++α+α=σ LL
2110
2−α+α=σ tt Y
Properties:
• Martingale difference sequence.
• Stationary for α1 ∈[0,2eE), E-Euler’s constant.
• Strongly mixing at a geometric rate.
• For general ARCH (GARCH), properties are difficult to establish.
32MaPhySto Workshop 9/04
2 Observation2 Observation--Driven ModelsDriven Models
2.1 GLARMA models for TS models of countsPropertiesExistence and uniqueness of stationary distributionsMaximum likelihood estimation and asymptotic theoryApplication to asthma data
2.2 GLARMA extensions Bernoulli case
2.3 OtherBin
33MaPhySto Workshop 9/04
Model: Yt | µt is Poisson(µt ) with log µt = xtT β + αt
Two components in the specification of αt (see also Shephard (1994)).
1. Uncorrelated (martingale difference sequence)
For λ > 0, define
(Specification of λ will be described later.)
2. Form a linear process driven by the MGD sequence et
where
Since the conditional mean µt is based on the whole past, the model is no longer Markov.
λµµ−= tttte /)Y(
.
,xlog
1
Tt
iti
it
tt
e −
∞
=∑ψ=α
α+β=µ
2.1 Generalized Linear ARMA (GLARMA) Model for Poisson Counts
34MaPhySto Workshop 9/04
Properties of the Model
1. et is a MG difference sequence E(et | Ft-1) = 0
2. et is an uncorrelated sequence (follows from 1)
3. E(et2) = E(µt
1−2λ)
= 1 if λ = .5
4. Set,
so that
and
,/)Y( λµµ−= tttte . ,xlog1
Tt it
iittt e −
∞
=∑ψ=αα+β=µ
,xlogW Ttt tt α+β=µ=
β= Ttt x)E(W
.5) (if
)E()Var(W
1
2
2-1i-t
1
2t
=λψ=
µψ=
∑
∑∞
=
λ∞
=
ii
iiβ= T
tt x)E(W
.5) (if
)E()Var(W
1
2
2-1i-t
1
2t
=λψ=
µψ=
∑
∑∞
=
λ∞
=
ii
ii
,/)Y( λµµ−= tttte . ,xlog1
Tt it
iittt e −
∞
=∑ψ=αα+β=µ
,xlogW Ttt tt α+β=µ=
35MaPhySto Workshop 9/04
Properties continued
5.
It follows that Wt has properties similar to the latent process specification:
which, by using the results for the latent process case and assuming the linear process part is nearly Gaussian, we obtain
By adjusting the intercept term, E(µt) can be interpreted as exp(xtTβ).
,
)()(
2/x
2/)(x
x
1
2Tt
Tt
Tt
∑ψ+β
α+β
ψ+β
∞
=
−
=
≈
∑=
ii
t
i itit
e
e
eEeEVar
eW
)E()W,Cov(W 2-1i-t
1htt
λ+
∞
=+ µψψ= ∑ hi
ii
iti
ie −
∞
=∑ψ+β=
1
Ttt xW
36MaPhySto Workshop 9/04
Properties continued
6. (GLARMA model). Let Ut be an ARMA process with driven by the MGD sequence et, i.e.,
Ut = φ1Ut-1+ . . . + φpUt-p + et + θ1et-1+ . . . + θqet-q
Then the best predictor of Ut based on the infinite past is
where
The model for log µt is then
where
iti
ie −
∞
=∑ψ=
1tU
.1)()( 1
1
−θφ=ψ −∞
=∑ zzzi
ii
,xW Tt tt Z+β=
.)Z()Z(UZ 11p-t11-t1tt qtqtptpt eeee −−−− θ++θ++φ+++φ== LL
37MaPhySto Workshop 9/04
Existence and uniqueness of a stationary distr in the simple case.
Consider the simplest form of the model with λ = 1, given by
Theorem: The Markov process Wt has a unique stationary distribution.
Idea of proof:
• State space is [β−γ, ∞) (if γ >0 ) and (−∞, β−γ] (if γ < 0 ).
• Satisfies Doeblin’s condition:
There exists a prob measure ν such for some m > 1, ε > 0, and δ >0,
ν(A) > ε implies Pm(x,A) ≥ δ for all x.
• Chain is strongly aperiodic.
• It follows that the chain Wt is uniformly ergodic (Thm 16.0.2 (iv) inMeyn and Tweedie (1993))
.)Y(W 1-t1-t WW1t
-t ee−γ+β= −
38MaPhySto Workshop 9/04
Existence of Stationary Distr in Case .5 ≤ λ <1.Consider the process
Propostion: The Markov process Wt has at least one stationary distribution.
Idea of proof:
• Wt is weak Feller.
• Wt is bounded in probability on average, i.e., for each x, the sequence
is tight.
• There exists at least one stationary distribution (Thm 12.0.1 in M&T)
,...,2,1 ),,(1
1∑ =− =⋅
k
ii kxPk
Lemma: If a MC Xt is weak Feller and P(x, •), x∈X is tight, then Xt is bounded in probability on average and hence has a stationary distribution.
Note: For our case, we can show tightness of P(x, •), x∈X using a Markov style inequality.
.)Y(W 1-t1-t WW1t
λ− −γ+β= -
t ee
39MaPhySto Workshop 9/04
Uniqueness of Stationary Distr in Case .5 ≤ λ <1?
Theorem (M&T `93): If the Markov process Xt is an e-chain which is bounded in probability on average, then there exists a unique stationary distribution if and only if there exists a reachable point x*.
For the process , we have
• Wt is bounded in probability uniformly over the state space.
• Wt has a reachable point x* that is a zero of the equation0 = x* + γ exp(1-λ) x*
• e-chain?
Reachable point: x* is a reachable point if for every open set Ocontaining x*,
for all x.
e-chain: For every continuous f with compact support, the sequence of functions Pnf , n =1,… is equicontinuous, on compact sets.
1-t1-t W-W1t )Y(W λ
− −γ+β= eet
∑∞
=>
10),(
nn OxP
40MaPhySto Workshop 9/04
Estimation for Poisson GLARMA
Let δ = ( βT, γT)T be the parameter vector for the model (γ corresponds to the parameters in the linear process part).
Log-likelihood:
where
First and second derivatives of the likelihood can easily be computed recursively and Newton-Raphson methods are then implementable. For example,
and the term can be computed recursively.
),)(WY()L( )(Wtt
1
t δ
=
−δ=δ ∑ en
t
.)(x)(W1
tt iti
i e −
∞
=∑ δψ+β=δ
δ∂δ∂
−=δ∂δ∂ δ
=∑ )(W)Y()L( t)(W
t1
ten
t
δ∂δ∂ /)(Wt
.
,xlog
1
Tt
iti
it
tt
e −
∞
=∑ψ=α
α+β=µ
Model: Yt | µt is Poisson(µt )
41MaPhySto Workshop 9/04
Asymptotic Results for MLE
Define the array of random variables by
Properties of ηnt:
•ηnt is a martingale difference sequence.
•
•
Using a MG central limit theorem, it “follows” that
where
.)(W)(Y t)(Wt
2/1 t
δ∂δ∂
−=η δ− ennt
).()|(1
1 δVFE Pn
tt
Tntnt ⎯→⎯ηη∑
=−
.0)|)|(|(1
1 ⎯→⎯>∑=
−P
n
ttnt
Tntnt FIE εηηη
),,0()ˆ( 12/1 −⎯→⎯δ−δ VNn D
).()(1lim1
)( δ∂δ∂= ∑=
∞→
Ttt
n
t
W
nWWe
nV t δ
42MaPhySto Workshop 9/04
Simulation Results
Model 1:
Parameter Mean SD SD(from like)β0 = 1.50 1.499 0.0263 0.0265γ = 0.25 0.249 0.0403 0.0408β0 = 1.50 1.499 0.0366 0.0364γ = 0.75 0.750 0.0218 0.0218β0 = 3.00 3.000 0.0125 0.0125γ = 0.25 0.249 0.0431 0.0430β0 = 3.00 3.000 0.0175 0.0174γ = 0.75 0.750 0.0270 0.0271
Model 2:
β0 = 1.00 1.000 0.0286 0.0284β1 = 0.50 0.500 0.0035 0.0034 γ = 0.25 0.248 0.0420 0.0426β0 = 1.50 0.998 0.0795 0.0805β1 = -.15 -.150 0.0171 0.0173γ = 0.25 0.247 0.0337 0.0339
5000nreps 500,n ,)Y(W 1-t1-t -WW10t ==−γ+β= − eet
5000nreps 500,n ,)Y(500/W 1-t1-t -WW110t ==−γ+β+β= − eet t
43MaPhySto Workshop 9/04
Application to Sydney Asthma Count Data
Data: Y1, . . . , Y1461 daily asthma presentations in a Campbelltown hospital.
Preliminary analysis identified.
• no upward or downward trend
• annual cycle modeled by cos(2πt/365), sin(2πt/365)
• seasonal effect modeled by
where B(2.5,5) is the beta function and Tij is the start of the jth school term in year i.
• day of the week effect modeled by separate indicatorvariables for Sunday and Monday (increase in admittance on these days compared to Tues-Sat).
• Of the meteorological variables (max/min temp, humidity)and pollution variables (ozone, NO, NO2), only humidity at lags of 12-20 days and NO2(max) appear to have an association.
55.2
1001
100)5,5.2(1)( ⎟⎟
⎠
⎞⎜⎜⎝
⎛ −−⎟⎟
⎠
⎞⎜⎜⎝
⎛ −= ijij
ij
TtTtB
tP
44MaPhySto Workshop 9/04
Model for Asthma Data
Trend function.
xtT=(1, St, Mt, cos(2πt/365), sin(2πt/365), P11(t), P12(t),
P21(t), P22(t), P31(t), P32(t), P41(t), P42(t), Ht, Nt)
( and ht is the residual from an annual cycle fitted to the
daily average humidity at 0900 and 1500.)
Model for αt.
ΜΑ(7): αt = θ7et-7
∑=
−−=6
0127
1i
itt hH
45MaPhySto Workshop 9/04
Results for Asthma Data
Term Est SE T-ratioIntercept 0.583 0.062 9.46Sunday effect 0.197 0.056 3.53Monday effect 0.230 0.055 4.20cos(2πt/365) -0.214 0.039 -5.54sin(2πt/365) 0.176 0.040 4.35Term 1, 1990 0.200 0.056 3.54Term 2, 1990 0.132 0.057 2.31Term 1, 1991 0.087 0.066 1.32Term 2, 1991 0.172 0.057 2.99Term 1, 1992 0.254 0.055 4.66Term 2, 1992 0.308 0.049 6.31Term 1, 1993 0.439 0.050 8.77Term 2, 1993 0.116 0.061 1.91Humidity Ht/20 0.169 0.055 3.09NO2 max -0.104 0.033 -3.16MA, lag 7 θ7 0.042 0.018 2.32
46MaPhySto Workshop 9/04
Asthma Data: observed and conditional means cond meansobserved
1990
Day of Year
Cou
nts
0 100 200 300
02
46
1991
Day of Year
Cou
nts
0 100 200 300
02
46
8
1992
Day of Year
Cou
nts
0 100 200 300
02
46
810
1214
1993
Day of Year
Cou
nts
0 100 200 300
02
46
810
47MaPhySto Workshop 9/04
2.2 GLARMA Extensions (Binary data)
Binary data: Y1, . . . , Yn
Regression (explanatory) variable: xt
Model: Distribution of the Yt given xt and the past is Bernoulli(pt), i.e.,
P(Yt = 1| Ft-1) = pt and P(Yt = 0 | Ft-1) = 1−pt.
As before construct a MGD sequence
and using the logistic link function, the GLARMA model becomes
and,x with
1logW T
tt ttt
t ZWp
p+β=
−=
2/1))1(/()Y( ttttt pppe −−=
.)Z()Z(UZ 11p-t11-t1tt qtqtptpt eeee −−−− θ++θ++φ+++φ== LL
48MaPhySto Workshop 9/04
A Simple GLARMA Model for Price Activity (R&S)
Model for price change: The price change Ci of the ith transaction has the following components:
• Yt activity 0,1
• Dt direction -1,1• St size 1, 2, 3, . . .
Rydberg and Shephard consider a model for these components. An autologistic model is used for Yt .
Simple GLARMA(0,1) model for price activity: Yt is a Bernoulli rvrepresenting a price change at the tth transaction. Assume Yt given Ft-1 is Bernoulli(pt), i.e.,
P(Yt = 1 | Ft-1) = pt = 1− P(Yt = 0 | Ft-1),
where.
)1(Y Zand
)1( 11-t1-t
1-t1t −
−σ
σ
=−
−=
+= t
ttZ
Z
epp
pe
ept
t
49MaPhySto Workshop 9/04
Existence of Stationary Solns for the Simple GLARMA Model
Consider the process
where Yt-1 is Bernoulli with parameter
Propostion: The Markov process Zt has a unique stationary distribution.
Idea of proof:
• Zt is an e-chain.
• Zt is bounded in probability on uniformly over the state space
• Possesses a reachable point ( x∗ is soln to x+eσx/2=0 )
, )1(
YZ1-t1-t
1-t1
pppt
t −−
= −
.)1( 1ZZ1-t
11 −σσ −− += tt eep
50MaPhySto Workshop 9/04
2.3 BIN Models: A Modeling Framework for Stock Prices(Davis, Ryderberg, Shephard, Streett)
Consider the model of a price of an asset at time t given by
where
• N(t) is the number of trades up to time t
• Zi is the price change of the ith transaction.
Then for a fixed time period ∆,
denotes the rate of return on the investment during the tth time interval and
denotes the number of trades in [t ∆, (t+1) ∆).
,)0((t))(
1∑
=
+=tN
iiZpp
, )())1((:))1((
1)(∑
−∆+
+∆=
=∆−−∆+=tN
tNiit Ztptpp
)())1((: ∆−−∆+= tNtNNt
51MaPhySto Workshop 9/04
The Bin Model for the Number of Trades
Bin(p,q) model: The distribution of the number of trades Nt in [t ∆, (t+1) ∆), conditional on information up to time t ∆− is Poisson with mean
Proposition: For the Bin(1,1) model,
there exists a unique stationary solution.
Idea of proof:
• λt is an e-chain.
• λt is bounded in probability on average.
• Possesses a reachable point ( x∗ = α/(1−γ) )
.1,0 ,0 ,N11
t <δγ≤≥αλδ+γ+α=λ ∑∑=
−=
− jj
q
jjtj
p
jjtj
,N 1-t1t δλ+γ+α=λ −t
52MaPhySto Workshop 9/04
Summary Remarks for Observation-Driven Models
The observation model for the Poisson counts proposed here has the following properties.
1. Easily interpretable on the linear predictor scale and on the scale of the mean µt with the regression parameters directly interpretable as the amount by which the mean of the count process at time t will change for a unit change in the regressor variable.
2. An approximately unbiased plot of the µt can be generated by
3. Is easy to predict with.
4. Provides a mechanism for adjusting the inference about the regression parameter β for a form of serial dependence.
5. Generalizes to ARMA type lag structure.
6. Estimation (approx MLE) is easy to carry out.
.)ˆ5.ˆexp(ˆ1
2∑∞
=
ψ−=µi
itt W
53MaPhySto Workshop 9/04
3 Parameter Driven Models3 Parameter Driven Models
3.1 EstimationGLMImportance samplingApproximation to the likelihood
3.2 Simulation and Application Time series of countsStochastic volatility
3.3 How good is the posterior approximation?Posterior mode vs posterior mean
3.4 Application to estimating structural breaks (tomorrow)Poisson modelStochastic volatility model
54MaPhySto Workshop 9/04
Exponential Family Setup for Parameter-Driven Model
Time series data: Y1, . . . , Yn
Regression (explanatory) vector: xt
Observation equation:
p(yt | αt) = exp(αt + βΤxt) yt − b(αt+ βΤxt) + c(yt).
State equation: αt follows an autoregressive process satisfying the recursions
αt = γ + φ1αt-1 + φ2αt-2+ … + φpαt-p+ εt ,
where εt ~ IID N(0,σ2).Note: αt = 0 corresponds to standard generalized linear model.
Original primary objective: Inference about β.
55MaPhySto Workshop 9/04
Estimating equations (Zeger `88):
Monte Carlo EM (Chan and Ledolter `95)
GLM (ignores the presence of the latent process, i.e., αt = 0.)
Importance sampling (Durbin & Koopman `01, Kuk `99, Kuk & Chen `97):
Approximate likelihood (Davis, Dunsmuir & Wang ’98)
Estimation Methods for Parameter Driven Models
56MaPhySto Workshop 9/04
Estimation Methods — Estimating Equations
Estimating equations (Zeger `88): Let be the solution to the equation
where µ = exp(X β) and Γn = var(Yn).
Iterative weighted least squares can be used to compute (See Zegerfor details and asymptotic results.)
β
,0)( =−Γ∂∂ µyβµ
nn
.ˆ β
57MaPhySto Workshop 9/04
Estimation Methods — MCEM
Monte Carlo EM (Chan and Ledolter `95): Given ψ(k) from the k-thiteration, ψ(k+1) is computed in the two steps:
E-Step: Compute Q(ψ| ψ(k) ) = E(L(ψ; yn, αn) | yn, ψ(k) ),
• L(ψ; yn, αn) is the log-likelihood based on yn, αn
• expectation taken with respect to p(αn | yn, ψ(k) )
M-Step: Update ψ(k) by maximizing Q(ψ| ψ(k) ) with respect to ψ
Note: This procedure is relatively straightforward except for drawing samples from p(αn | yn, ψ(k) ) in the E-step. Chan and Ledolter use a Gibbs sampler for this.
58MaPhySto Workshop 9/04
Suppose Yt follows the linear model with time series errors given by
Yt = xtT β + Wt ,
where Wt is a stationary (ARMA) time series.
• Estimate β by ordinary least squares (OLS).
• OLS estimate has same asymptotic efficiency as MLE.
• Asymptotic covariance matrix of depends on ARMA parameters.
• Identify and estimate ARMA parameters using the estimated residuals,
Wt = Yt - xtT
• Re-estimate β and ARMA parameters using full MLE.
βOLS
βOLS
GLM estimation – (Linear Regression Model)
59MaPhySto Workshop 9/04
Model: Yt | αt , xt ∼ Pois(exp(xtT β + αt )).
GLM log-likelihood:
(This likelihood ignores presence of the latent process.)
⎥⎦
⎤⎢⎣
⎡−β+−=)β ∏∑∑===
βn
tt
n
t
n
t
yyel11
Ttt
1
x !logx(Tt
Estimation Methods Specialized to Poisson Example— GLM estimation
Assumptions on regressors:
),()(xx
),(xx
1 1
Tst
1,
1
Ttt
1,
βΩ→−γµµ=Ω
βΩ→µ=Ω
∑∑
∑
=ε
=
−
=
−
II
n
ts
n
stnII
I
n
ttnI
tsn
n
),()(xx
),(xx
1 1
Tst
1,
1
Ttt
1,
βΩ→−γµµ=Ω
βΩ→µ=Ω
∑∑
∑
=ε
=
−
=
−
II
n
ts
n
stnII
I
n
ttnI
tsn
n
60MaPhySto Workshop 9/04
Theorem (Davis, Dunsmuir, Wang `00). Let be the GLM estimate of βobtained by maximizing l(β) for the Poisson regression model with a stationary lognormal latent process. Then
). N(0, )ˆ( 1III
1I
1I
2/1 −−− ΩΩΩ+Ω→β−βd
n
Theory of GLM Estimation in Presence of Latent Process
Notes:
1. n-1ΩI-1 is the asymptotic cov matrix from a std GLM analysis.
2. n-1ΩI-1 ΩII ΩI
-1 is the additional contribution due to the presence of the latent process.
3. Result also valid for more general latent processes (mixing, etc),
4. The xt can depend on the sample size n.
ββ
61MaPhySto Workshop 9/04
Conditions on the regressors hold for:
1. Trend functions.xnt = f(t/n)
where f is a continuous function on [0,1]. In this case,
Remark. xnt = (1, t/n) corresponds to linear regression and works. However xt = (1, t) does not produce consistent estimates say if the true slope is negative.
. )()()()(
,)()(
)(2T1
01 1
Tst
1
)(T1
01
Ttt
1
T
T
∑∫∑∑
∫∑
εβ
=ε
=
−
β
=
−
γ→−γµµ
→µ
h
tfn
ts
n
st
tfn
tt
hdtetftftsxxn
dtetftfxxn
When Does CLT Apply?
62MaPhySto Workshop 9/04
2. Harmonic functions to specify annual or weekly effects, e.g.,
xt = cos(2πt/7)
3. Stationary process. (e.g. seasonally adjusted temperature series.)
When Does CLT Apply? (cont)
63MaPhySto Workshop 9/04
Year
Counts
1970 1972 1974 1976 1978 1980 1982 1984
02
46
810
1214
•••••
•
•
••
•
•
•
••
••••
•••••
•
•
•
•••
•
•••
•
•
••••••••
••••••••••••••••
•
••••••••
••••
•
•
•••
•
•
•
•
•••••
••••
••
•
••
•
•••••••
••
•
•••
••
•
••
•
•
•••
•
••••
•
•••••••••••••
••
•
•
•••••••••
•
••••
•••••
•••••
•
•
Application to Polio Data
64MaPhySto Workshop 9/04
Assume the αt follows a log-normal AR(1), where
(αt+σ2/2) = φ(αt-1+ σ2/2) +ηt , ηt~IID N(0, σ2(1−φ2)),
with φ =.82, σ2 = .57.
βZ s.e. βGLM s.e. s.e. βGLM s.e.
Application to Model for Polio Data
.207 .075 .205-4.80 1.40 4.12-0.15 .097 .157-0.53 .109 .168.169 .098 .122-.432 .101 .125
.150 .213-4.89 3.94-.145 .144-.531 .168.167 .123-.440 .125
Zeger GLM Fit Asym Simulation
Intercept 0.17 0.13Trend(×10-3) -4.35 2.68cos(2πt/12) -0.11 0.16sin(2πt/12) -.048 0.17cos(2πt/6) 0.20 0.14σιν(2πτ/6) -0.41 0.14
65MaPhySto Workshop 9/04
Year
Counts
1970 1972 1974 1976 1978 1980 1982 1984
02
46
810
1214
•••••
•
•
••
•
•
•
••
••••
•••••
•
•
•
•••
•
•••
•
•
••••••••
••••••••••••••••
•
••••••••
••••
•
•
•••
•
•
•
•
•••••
••••
••
•
••
•
•••••••
••
•
•••
••
•
••
•
•
•••
•
••••
•
•••••••••••••
••
•
•
•••••••••
•
••••
•••••
•••••
•
•
Polio Data With Estimated Regression Function
66MaPhySto Workshop 9/04
Estimation Methods — Importance Sampling (Durbin and Koopman)
Model: Yt | αt , xt ∼ Pois(exp(xt
T β + αt ))
αt = φ αt-1+ εt , εt~IID N(0, σ2)
Relative Likelihood: Let ψ=(β, φ, σ2) and suppose g(yn, αn; ψ0) is an approximating joint density for Yn= (Y1, . . . , Yn)' and αn= (α1, . . . , αn)'.
nnnn dppL α)α()α|y()( ∫=ψ
nnnnn
nnn
g
nnnnnn
nnn
dgg
ppLL
dggg
pp
α);y|α();α,y(
)α()α|y()(
)(
α);y();y|α();α,y(
)α()α|y(
000
000
∫
∫
ψψ
=ψψ
ψψψ
=
nnnnn
nnn dgg
pp α);α,y();α,y(
)α()α|y(0
0∫ ψ
ψ= nnn
nn
nnn dgg
pp α);α,y();α,y(
)α()α|y(0
0∫ ψ
ψ=
nnnnn
nnn
g
nnnnnn
nnn
dgg
ppLL
dggg
pp
α);y|α();α,y(
)α()α|y()(
)(
α);y();y|α();α,y(
)α()α|y(
000
000
∫
∫
ψψ
=ψψ
ψψψ
=
67MaPhySto Workshop 9/04
Importance Sampling (cont)
where
,);α,y(
)α()α|y(1~
);y|);α,y(
)α()α|y(
α);y|α();α,y(
)α()α|y()(
)(
1 0)(
)()(
00
000
∑
∫
= ψ
⎥⎦
⎤⎢⎣
⎡ψ
ψ=
ψψ
=ψψ
N
jj
nn
jn
jnn
nnn
nnng
nnnnn
nnn
g
gpp
N
gppE
dgg
ppLL
).;y|g(α iid ~,...,1;α 0)( ψ= nn
jn Nj
.ψ
Notes:
• This is a “one-sample” approximation to the relative likelihood. That is, for one realization of the α’s, we have, in principle, an approximation to the whole likelihood function.
• Approximation is only good in a neighborhood of ψ0. Geyer suggests maximizing ratio wrt ψ and iterate replacing ψ0 with
68MaPhySto Workshop 9/04
Importance Sampling — example
phi_0=-0.9
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
200
250
300
phi_0=-0.367
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
220
240
260
280
phi_0=0.029
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
180
220
260
phi_0=0.321
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
100
150
200
250
Simulation example: Yt | αt ∼ Pois(exp(.7 + αt )),
αt = .5 αt-1+ εt , εt~IID N(0, .3), n = 200, N = 1000
69MaPhySto Workshop 9/04
Importance Sampling — example
Simulation example: β = .7, φ = .5, σ2=.3, n = 200, N = 1000, 50 realizations plotted
phi_0=-0.5
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
180
220
260
300
phi_0=-0.25
likel
ihoo
d-1.0 -0.5 0.0 0.5 1.0
200
220
240
260
280
phi_0=0
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
160
180
200
220
240
260
280
phi_0=0.25
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
150
200
250
phi_0=0.5
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.0
5010
015
020
025
0
phi_0=0.75
likel
ihoo
d
-1.0 -0.5 0.0 0.5 1.050
100
150
200
250
70MaPhySto Workshop 9/04
Importance Sampling (cont)
where for each ψ, are generated using the same noise sequence. This is the idea behind Durbin and Koopman’s implementation.
,);α|y(
)α|y()(~
);y|);α|y(
)α|y()(
α);y|α();α,y(
)α()α|y()()(
1)(
)(
∑
∫
= ψψ
⎥⎦
⎤⎢⎣
⎡ψ
ψψ=
ψψ
ψ=ψ
N
jj
nn
jnng
nnn
nngg
nnnnn
nnng
gp
NL
gpEL
dgg
ppLL
),;y|g(α iid ~,...,1;α )( ψ= nnj
n Nj
• Instead one can use ψ0 = ψ to get
71MaPhySto Workshop 9/04
phi
log
likel
ihoo
d
0.3 0.4 0.5 0.6 0.7
-416
-415
-414
-413
Simulation example: Yt | αt ∼ Pois(exp(.7 + αt )),
αt = .5 αt-1+ εt , εt~IID N(0, .3), n = 200, N = 1000
Importance Sampling — example
72MaPhySto Workshop 9/04
Importance Sampling (cont)
Choice of importance density g:
Durbin and Koopman suggest a linear state-space approximating model
Yt = µt+ xtT β + αt+Zt , Zt~N(0,Ht),
with
where the are calculated recursively under the approximating model until convergence.
),,y~(~);y|g(α 110
−− ΓΓψ nnnnn N
.))'αα(()(diag
,αyy~
1αXβ
αXβαXβ n
−+
++
+=Γ
+−=
nnn
nnn
Ee
ee
n
n
With this choice of approximating model, it turns out that
where
)y|(ˆ ntgt E α=α,
,1'xˆ
)'xˆ(
)'xˆ(t
β+α−
β+α−
=
+−α−=µ
tt
tt
eH
eyy
t
tttt
,
,1'xˆ
)'xˆ(
)'xˆ(t
β+α−
β+α−
=
+−α−=µ
tt
tt
eH
eyy
t
tttt
),,y~(~);y|g(α 110
−− ΓΓψ nnnnn N
.))'αα(()(diag
,αyy~
1αXβ
αXβαXβ n
−+
++
+=Γ
+−=
nnn
nnn
Ee
ee
n
n
73MaPhySto Workshop 9/04
Importance Sampling (cont)
Components required in the calculation.
• g(yn,αn)
• simulate from
compute
simulate from
Remark: These quantities can be computed quickly using a version of the innovations algorithm or the Kalman smoothing recursions.
nnn y~'y~ 1−Γ
)det( nΓ
),y~( 11 −− ΓΓ nnnN
nn y~1−Γ
),0( 1−ΓnN
74MaPhySto Workshop 9/04
Log-likelihood based on yn and αn :
Estimation Methods — Approximation to the likelihood
General setup:
where
Likelihood:
2/)()(exp||)|y(),y( 2/1 µ−αµ−α−α∝α nnT
nnnnnn GGpp
)()(1 µ−αµ−α=−n
Tnn EG )()(1 µ−αµ−α=−
nT
nn EG
nnnn dppL α)α()α|y()( ∫=ψ nnnn dppL α)α()α|y()( ∫=ψ
)()(21)|y;(|G|log
21)2log(
2
);,y(log),y;(
µ−αµ−α−αθ++π−=
ψα=αψ
nnT
nnnn
nnnn
Glnpl
)()(21)|y;(|G|log
21)2log(
2
);,y(log),y;(
µ−αµ−α−αθ++π−=
ψα=αψ
nnT
nnnn
nnnn
Glnpl
2/)()(exp||)|y(),y( 2/1 µ−αµ−α−α∝α nnT
nnnnnn GGpp
75MaPhySto Workshop 9/04
where
Estimation Methods — Approximation to the likelihood (cont)
If T(α ;α*) denotes the second-order Taylor series expansion of l(θ; yn |αn)around α* with remainder
R(α ;α*) = l(θ; yn |αn) −T(α ;α*),where α* is the mode of l(ψ; yn ,αn), we have
*
*
)|y;(*
)|y;(*2
nn
nn
nnTnn
nn
lK
lh
α=α
α=α
αθα∂α∂
∂−=
αθ=
*),;(*))(*(*)(21
)*()*(21*|G|log
21)2log(
2),y;(
αα+α−α+α−α−
µ−αµ−α−++π−=αψ
nnnT
n
nT
nnn
RGK
Ghnl
*),;(*))(*(*)(21
)*()*(21*|G|log
21)2log(
2),y;(
αα+α−α+α−α−
µ−αµ−α−++π−=αψ
nnnT
n
nT
nnn
RGK
Ghnl
*
*
)|y;(*
)|y;(*2
nn
nn
nnTnn
nn
lK
lh
α=α
α=α
αθα∂α∂
∂−=
αθ=
76MaPhySto Workshop 9/04
where
Estimation Methods — Approximation to the likelihood (cont)
Let pa(αn | yn, ψ) be the posterior based on the log likelihood neglectingR(α ;α*),
i.e.,
),()y;(
),y;()y;(
ψψ=
ααψ=ψ ∫ana
nnnn
ErL
dLL
( ) )**,;(),y|( 1−+ααφ=ψα nnnna GKp
Then
2/)()(*exp*
||)y;( **2/1
2/1
n µ−αµ−α−+
=ψ nT
n
na Gh
GKGL
and
∫ αψααα=ψ nnnana pREr d );y|(*);(exp)(
),()y;(
),y;()y;(
ψψ=
ααψ=ψ ∫ana
nnnn
ErL
dLL
2/)()(*exp*
||)y;( **2/1
2/1
n µ−αµ−α−+
=ψ nT
n
na Gh
GKGL
∫ αψααα=ψ nnnana pREr d );y|(*);(exp)(
( ) )**,;(),y|( 1−+ααφ=ψα nnnna GKp
77MaPhySto Workshop 9/04
Notes:
1. The approximating posterior,
is identical to the importance sampling density used by Durbin and Koopmanfor the case of exponential families.
Estimation Methods — Approximation to the likelihood (cont)
2. In traditional Bayesian setting, posterior is approximately pa for large n(see Bernardo and Smith, 1994).
( ) ),**,;(),y|( 1−+ααφ=ψα nnnna GKp ( ) )**,;(),y|( 1−+ααφ=ψα nnnna GKp
78MaPhySto Workshop 9/04
1. Importance sampling (IS).
Estimation Methods — Approximation to the likelihood (cont)
Recall that
)()y;()y;( ψψ=ψ anan ErLL
where
),(ˆ)y;()y;(ˆ ψψ=ψ anan rELL
*)),,(exp(1ˆ1
)( αα= ∑=
N
i
ina R
NrE
and ).;y|(αp iid ~,...,1;α a)( ψ= nn
jn Nj
79MaPhySto Workshop 9/04
Estimation Methods — Approximation to the likelihood (cont)
)()y;()y;( ψψ=ψ anan ErLL
)y;()y;(ˆnan LL ψ=ψ
2. Approximate likelihood (AL).If pa(αn | yn, ψ) is concentrated around αn=α∗, then the error term Era(ψ)is close to 1. This suggests ignoring the error term and approximating the likelihood by
3. Hybrid estimate (AIS).Approximate e(ψ) = log Era(ψ) by a linear function. Let be the maximum likelihood estimator based on the objective function La(ψ).Then
ALψ
)ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &
Ignoring the constant term, we have
).ˆ)(ˆ(exp)y;()y;(ˆALALnan eLL ψ−ψψψ∝ψ &
Use one run of importance sampling to estimate ).ˆ( ALe ψ&
)y;()y;(ˆnan LL ψ=ψ
)ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &
).ˆ)(ˆ(exp)y;()y;(ˆALALnan eLL ψ−ψψψ∝ψ &
).ˆ( ALe ψ&
ALψ
80MaPhySto Workshop 9/04
Estimation Methods — Approximation to the likelihood (cont)
Case of exponential family:
where
and α* is the solution to the equation
( ),2/)()()y()(1yexp||)y;( ****
2/1
2/1
n µ−αµ−α−−α−α+
=ψ nT
nTT
nn
na Gcb
GKGL
,)(diag**
2
2
t
ttt
bKα
αα∂∂
=
.0)α(G)b(αy nnn =µ−−α∂∂
−n
n
Using a Taylor expansion, the latter equation can be solved iteratively.
81MaPhySto Workshop 9/04
phi
log
Era
0.3 0.4 0.5 0.6 0.7
-0.5
0.0
0.5
1.0
phi
log
Era
0.3 0.4 0.5 0.6 0.7
-0.5
0.0
0.5
1.0
⎟⎠
⎞⎜⎝
⎛αα=ψ ∑
=
*)),(exp(1log))(ˆlog(1
)(N
i
ina R
NrE
A closer look at log (Era(ψ))
)ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &
∫ αψααα=ψ nnnana pREr d );y|(*);(exp)(
⎟⎠
⎞⎜⎝
⎛αα=ψ ∑
=
*)),(exp(1log))(ˆlog(1
)(N
i
ina R
NrE )ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &
∫ αψααα=ψ nnnana pREr d );y|(*);(exp)(
82MaPhySto Workshop 9/04
0.3 0.4 0.5 0.6 0.7
-416
-415
-414
-413
-412
0.3 0.4 0.5 0.6 0.7
-416
-415
-414
-413
-412
Model: Yt | αt ∼ Pois(exp(.7 + αt )), αt = .5 αt-1+ εt , εt~IID N(0, .3), n = 200
Importance Sampling Hybrid: AL + IS
Comparison between IS and AIS
83MaPhySto Workshop 9/04
0.3 0.4 0.5 0.6 0.7
-416
-415
-414
-413
-412
0.3 0.4 0.5 0.6 0.7
-416
-415
-414
-413
-412 phi
log
Era
0.3 0.4 0.5 0.6 0.7
-0.5
0.0
0.5
1.0
phi
log
Era
0.3 0.4 0.5 0.6 0.7
-0.5
0.0
0.5
1.0
⎟⎠
⎞⎜⎝
⎛αα=ψ ∑
=
*)),(exp(1000
1log))(ˆlog(1000
1
)(
i
ina RrE )ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &⎟
⎠
⎞⎜⎝
⎛ αα=ψ ∑=
*)),(exp(1000
1log))(ˆlog(1000
1
)(
i
ina RrE )ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &
84MaPhySto Workshop 9/04
Model: Yt | αt ∼ Pois(exp(1.0 + αt ))
αt = 1.25αt-1− .75αt-2+ εt , εt~IID N(0, .2), n = 200
Estimation methods:
• Importance sampling (N=100)
β φ1 φ2 σ2
mean 0.9943 1.2520 -.7536 0.1925
rmse 0.0832 0.0749 0.0668 0.0507
Comparison between IS and AIS
• Approximate likelihood + IS
β φ1 φ2 σ2
mean 0.9943 1.2520 -.7586 0.1925
rmse 0.0832 0.0749 0.0669 0.0507
85MaPhySto Workshop 9/04
Model: Yt | αt ∼ Pois(exp(1.0 + αt ))
αt = 1.25αt-1− .75αt-2+.2αt-3 + εt , εt~IID N(0, .2), n = 200
Estimation methods:
• Importance sampling (N=100)
β φ1 φ2 φ3 σ2
mean 1.0009 1.2473 -.7496 0.1896 0.1969
rmse 0.1139 0.2765 0.3815 0.2011 0.0783
Comparison between IS and AIS (cont)
• Approximate likelihood + IS
β φ1 φ2 φ3 σ2
mean 1.0009 1.2471 -.7493 0.1894 0.1969
rmse 0.1139 0.2759 0.3809 0.2009 0.0783
86MaPhySto Workshop 9/04
Model: Yt | αt ∼ Pois(exp(.7 + αt )), αt = .5 αt-1+ εt , εt~IID N(0, .3), n = 200
Estimation methods: Simulation results based on 1000 replications.
beta phi sigma2
mean 0.7038 0.4783 0.2939
std 0.0970 0.1364 0.0789
Simulation Results
mean 0.7035 0.4627 0.2956
std 0.0972 0.1413 0.0785
mean 0.7039 0.4784 0.2938
std 0.0969 0.1364 0.0789
Importance sampling (N=100)
Approx likelihood
AL + IS (AIS)
87MaPhySto Workshop 9/04
0.4 0.5 0.6 0.7 0.8 0.9 1.0
01
23
4
dens
ity
0.2 0.4 0.6 0.8
0.0
1.0
2.0
3.0
0.1 0.2 0.3 0.4 0.5
01
23
4
0.4 0.5 0.6 0.7 0.8 0.9 1.0
01
23
4
dens
ity
0.2 0.4 0.6 0.8
0.0
1.0
2.0
3.0
0.1 0.2 0.3 0.4 0.5
01
23
4
0.4 0.5 0.6 0.7 0.8 0.9 1.0
01
23
4
beta
dens
ity
0.2 0.4 0.6 0.8
0.0
1.0
2.0
3.0
phi
0.1 0.2 0.3 0.4 0.5
01
23
4
sigma^2
Model: Yt | αt ∼ Pois(exp(.7 + αt )), αt = .5 αt-1+ εt , εt~IID N(0, .3), n = 200
Approx likelihood
Importance Sampling
AL + IS (AIS)
88MaPhySto Workshop 9/04
AR model order selection for αt:
Approx Like (AL)
p 0 1 2 3 4 5 13
AIC 518.0 512.3 512.3 513.9 512.3 514.2 527.4
log(like) -252.0 -248.1 -247.1 -246.9 -245.2 -245.1 -243.7
Application to Model Fitting for the Polio Data
Approx + IS (AIS)
p 0 1 2 3 4 5 13
AIC 519.8 512.6 512.1 513.9 511.5 514.3 527.4
log(like) -252.9 -248.3 -247.0 -246.9 -244.8 -245.2 -243.7
89MaPhySto Workshop 9/04
Import Sampling
Intercept 0.238 0.283Trend(×10-3) -3.744 2.823cos(2pt/12) 0.160 0.152sin(2pt/12) -0.481 0.168cos(2pt/6) 0.414 0.128sin(2pt/6) -0.011 0.126 f 0.661 0.202 s2 0.272 0.115
AL + IS
0.239 0.285-3.746 2.8670.161 0.151-0.480 0.1640.414 0.122-0.011 0.1270.661 0.2090.272 0.112
Approx Like
0.242 0.273 -3.814 2.7670.162 0.142-0.482 0.1660.413 0.128-0.011 0.1290.627 0.2290.289 0.122
ISˆ β SE
Model for αt:αt = φαt-1+εt , εt~IID N(0, σ2).
Importance sampling (N=1000)
Application to Model Fitting for the Polio Data
AISˆ β SE AL
ˆ β SE
90MaPhySto Workshop 9/04
Simulation Results
Stochastic volatility model:Yt = σt Zt , Zt~IID N(0,1)
αt = γ + φαt-1 + εt , εt~IID N(0,σ2), where αt = 2 log σt ; n=500, N=100,NR=500
RMSE RMSE
.210 .195
.025 .023
.065 .064
True AL
γ −.411 −.491
φ 0.950 0.940
σ 0.484 0.479
AIS
−.481
0.942
0.477
RMSE RMSE
.341 .299
.046 .040
.068 .064
True AL
γ −.368 −.500
φ 0.950 0.932
σ 0.260 0.270
AIS
−.483
0.934
0.269
CV=10
CV=1
91MaPhySto Workshop 9/04
phi
log
Era
0.940 0.945 0.950 0.955 0.960
-0.5
0.0
0.5
1.0
1.5
2.0
phi
log
likel
ihoo
d
0.940 0.945 0.950 0.955 0.960
1356
1358
1360
1362
1364
1366
1368
phi
log
Era
0.940 0.945 0.950 0.955 0.960
-0.5
0.0
0.5
1.0
1.5
2.0
phi
log
Era
0.940 0.945 0.950 0.955 0.960
-0.5
0.0
0.5
1.0
1.5
2.0
⎟⎠
⎞⎜⎝
⎛ αα=ψ ∑=
*)),(exp(5001log))(ˆlog(
500
1
)(
i
ina RrE )ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &
phi
log
likel
ihoo
d
0.940 0.945 0.950 0.955 0.960
1356
1358
1360
1362
1364
1366
1368
phi
log
Era
0.940 0.945 0.950 0.955 0.960
-0.5
0.0
0.5
1.0
1.5
2.0
⎟⎠
⎞⎜⎝
⎛ αα=ψ ∑=
*)),(exp(500
1log))(ˆlog(500
1
)(
i
ina RrE )ˆ)(ˆ()ˆ(~)( ALALAL eee ψ−ψψ+ψψ &
92MaPhySto Workshop 9/04
AL BC SE AIS BC SE
γ -0.0227 -0.0140 0.0198 -0.0230 -0.0153 0.0173
φ 0.9750 0.9845 0.0194 0.9747 0.9832 0.0166
σ2 0.0267 0.0228 0.0141 0.0273 0.0228 0.0138
Application to Pound-Dollar Exchange Rates
Stochastic volatility model:Yt = σt Zt , Zt~IID N(0,1)
αt = γ + φαt-1 + εt , εt~IID N(0,σ2), where αt = 2 log σt ; N=1000, B=500
93MaPhySto Workshop 9/04
Application to Sydney Asthma Count Data
Data: Y1, . . . , Y1461 daily asthma presentations in a Campbelltown hospital.
Preliminary analysis identified.
• no upward or downward trend
• annual cycle modeled by cos(2πt/365), sin(2πt/365)
• seasonal effect modeled by
where B(2.5,5) is the beta function and Tij is the start of the jth school term in year i.
• day of the week effect modeled by separate indicatorvariables for Sunday and Monday (increase in admittance on these days compared to Tues-Sat).
• Of the meteorological variables (max/min temp, humidity)and pollution variables (ozone, NO, NO2), only humidity at lags of 12-20 days and NO2(max) appear to have an association.
55.2
1001
100)5,5.2(1)( ⎟⎟
⎠
⎞⎜⎜⎝
⎛ −−⎟⎟
⎠
⎞⎜⎜⎝
⎛ −= ijij
ij
TtTtB
tP
94MaPhySto Workshop 9/04
Term AL Intercept 0.568Sunday effect 0.199Monday effect 0.225cos(2pt/365) -0.214 sin(2pt/365) 0.177Term 1, 1990 0.199Term 2, 1990 0.133Term 1, 1991 0.085Term 2, 1991 0.171Term 1, 1992 0.249Term 2, 1992 0.302Term 1, 1993 0.431Term 2, 1993 0.114Humidity Ht/20 0.009NO2 max -0.101AR(1), φ 0.774
σ2 0.011
SE.064.053.053.041.043.063.066.070.064.061.060.060.067.003.033.327.015
AIS0.5690.199 0.226 -0.214 0.177 0.200 0.133 0.0850.171 0.249 0.302 0.432 0.114 0.009 -0.101 0.786 0.010
Results for Asthma Data—(AL & AIS, N=100)
SE.066.051.052.041.048.065.063.078.069.066.058.062.075.003.034.394.013
95MaPhySto Workshop 9/04
Asthma Data: observed and conditional mean cond meanobserved
1990
Day of Year
Cou
nts
0 100 200 300
02
46
1991
Day of Year
Cou
nts
0 100 200 300
02
46
8
1992
Day of Year
Cou
nts
0 100 200 300
02
46
810
1214
1993
Day of Year
Cou
nts
0 100 200 300
02
46
810
96MaPhySto Workshop 9/04
Is the posterior distribution close to normal?
Compare posterior mean with posterior mode: Can compute the posterior mean using SIR (sampling importance-resampling) or particle filtering.
Posterior mode: The mode of p(αn | yn) is α* found at the last iteration of AL.
Posterior mean: The mean of p(αn | yn) can be found using SIR.
Let α(1), α(2), . . . , α(N) be independent draws from the multivariate distrpa(αn | yn). For N large, an approximate iid sample from p(αn | yn) can be obtained by drawing a random sample from α(1), α(2), . . . , α(N) with probabilities
.,,1 *),;(αexp)y|(α)y|(α , (i)
(i)
(i)
1
NiRppw
w
wpna
niN
ii
ii K=α∝==
∑=
97MaPhySto Workshop 9/04
Posterior mean vs posterior mode?
Polio data: blue = mean, red = mode
t
smoo
thed
sta
te v
ecto
r
0 50 100 150
-0.5
0.0
0.5
1.0
1.5
98MaPhySto Workshop 9/04
Posterior mean vs posterior mode?
Pound/US exchange rate data: blue = mean, red = mode
time
post
erio
r mea
n
0 200 400 600 800
-2-1
01
99MaPhySto Workshop 9/04
Is the posterior distribution close to normal?
Suppose α(1), α(2), . . . , α(M) are independent draws from the multivariate distr p(αn | yn), generated using SIR. Then
~ ))(()( 2n
*)(*)(2 χα−α+α−α=iid
jn
Tjj GKd
Observed quantile
chi-s
quar
e qu
antil
e
30 40 50 60 70 80
3050
70
n=50, M=100
Observed quantile
chi-s
quar
e qu
antil
e
30 40 50 60 70 80
3050
70n=50, M=100
Observed quantile
chi-s
quar
e qu
antil
e
30 40 50 60 70 80
3050
70
n=50, M=100
Observed quantile
chi-s
quar
e qu
antil
e
80 100 120 140
8010
014
0
n=100, M=150
Observed quantile
chi-s
quar
e qu
antil
e
60 80 100 120 140
6080
120
n=100, M=150
Observed quantile
chi-s
quar
e qu
antil
e
80 100 120 140
8010
014
0
n=100, M=150
Observed quantile
chi-s
quar
e qu
antil
e
160 180 200 220 240 260
160
200
240
n=200, M=250
Observed quantile
chi-s
quar
e qu
antil
e
160 180 200 220 240 260
160
200
240
n=200, M=250
Observed quantile
chi-s
quar
e qu
antil
e
160 180 200 220 240 260
160
200
240
n=200, M=250
Correlations are notsignificant.
100MaPhySto Workshop 9/04
Summary Remarks
1. Importance sampling offers a nice clean method for estimation in parameter driven models.
2. Approximation to the likelihood is a non-simulation based procedure which may have great potential especially with large sample sizes and/or large number of explanatory variables.
3. The hybrid approach seems like a good compromise betweenimportance sampling and the approximate likelihood methods.
5. Approximate likelihood and hybrid approaches are amenable to bootstrapping procedures for bias correction.
6. Posterior mode matches posterior mean reasonably well.
7. Approximate likelihood approach may be useful to the problem ofstructural break detection, the subject of tomorrow’s lecture.