Upload
nguyen-thanh-an
View
48
Download
0
Embed Size (px)
Citation preview
The Durbin-Levinson Algorithm
Consider the problem of estimating the parameters of an AR(p) model.
• We already have the Yule-Walker equations
~φ = Γ−1p ~γp, σ2
Z = γX(0) − ~φt ~γp.
• The Durbin-Levinson algorithm provides an alternative that avoids the matrix inversion inthe Yule-Walker equations.
• It is actually a prediction algorithm. We will see that it also can be used for parameterestimation for the AR(p) model.
• Nice side effects of using the DL-algorithm: we will automatically get partial autocorrelationsand mean-squared errors associated with our predictions!
• The down-side is that it will not help us in our upcoming “dance war” with the mathstatsclass.
• It may or may not run on squirrels. This is still an open problem.
The DL-algorithm is an example of a recursive prediction algorithm.
• Suppose we predict Xn+1 from X1, X2, . . . , Xn.
• Suppose then that time goes by and we get to observe Xn+1, but now we want to predictXn+2 from X1, X2, . . . , Xn+1.
• We could
– start from “scratch”,
– or, we could use what we learned from predicting Xn+1 and update that somehow!
The setup for the DL-algorithm is a mean zero, (otherwise subtract the mean, predict, and add itback), stationary process {Xt} with covariance function γX(h).
Notation:
• The best linear predictor of Xn+1 given X1, X2, . . . , Xn :
Xn+1 = bnnX1 + bn,n−1X2 + · · · + bn1Xn =n∑
i=1
bniXn−i+1
• The mean squared prediction error is
vn = E
[
(
Xn+1 − Xn+1
)2]
We want to recursively compute the “best” b’s, and, at the same time, compute the v’s.
In the proof of the DL-algorithm, it becomes apparent why
bnn = αX(n) = the PACF at lag n.
Without further ado...
The Durbin-Levinson Algorithm
• Step Zero Set b00 = 0, v0 = γX(0), and n = 1.
• Step One Compute
bnn =
[
γX(n) −n−1∑
i=1
bn−1,iγX(n − i)
]
· v−1n−1.
• Step Two For n ≥ 2, compute
bn1...
bn,n−1
=
bn−1,1...
bn−1,n−1
− bnn
bn−1,n−1...
bn,1
.
• Step Three Compute
vn = vn−1(1 − b2nn)
Set n = n + 1 and return to Step One.
(Note: The DL-algorithm requires that γX(0) > 0 and that γX(n) → 0 as n → ∞.)
Proof:
1. Set A1 to be the span of {X2, . . . , Xn}. That is, let A1 be the set of all random variables thatcan be formed from linear combinations of X2, . . . , Xn.
Let A2 be the span of the single random variable X1−L{X2,...,Xn}(X1). Here, L{X2,...,Xn}(X1)is our usual notation for the best linear predictor of X1 based on X2, . . . , Xn. (As mentionedin class, it is the “projection of X1 onto the subspace generated by X2, . . . , Xn” and is morecommonly written as Psp{X2,...,Xn}(X1).)
Note: Xn+1 = L{X1,...,Xn}(Xn+1) if and only if
• Xn+1 is a linear combination of X1, . . . , Xn
• E[(Xn+1 − Xn+1)Xi] = 0 for i = 1, 2, . . . , n
(That second condition is from “the derivative set equal to zero” used in minimizing the MSEof the best linear predictor.)
2. Claim: A1 and A2 are “orthogonal” in the sense that if Y1 ∈ A1 and Y2 ∈ A2 then E[Y1Y2] = 0.
Proof of claim:
• Y1 ∈ A1 implies that Y1 has the form
Y1 = a2X2 + · · · + anXn.
• Y2 ∈ A2 implies that Y2 has the form
Y2 = a[
X1 − L{X2,...,Xn}(X1)]
• So,
E[Y1Y2] = E
[
(∑n
i=2 aiXi) · a(
X1 − L{X2,...,Xn}(X1))]
= a∑n
i=2 ai E
[
Xi
(
X1 − L{X2,...,Xn}(X1))]
= 0
because that expectation is zero for each i in the sum.
3. Note thatXn+1 = L{X1,...,Xn}(Xn+1) = LA1(Xn+1) + LA2(Xn+1)
= L{X2,...,Xn}(Xn+1) + a(
X1 − L{X2,...,Xn}(X1))
for some a ∈ IR.
4. In general, if we want to find the best linear predictor of a random variable Y based on arandom variable X:
Y = aX, we minimize E
[
(Y − aX)2]
with respect to a.
It is easy to show that a = E[XY ]/E[X2].
In our problem then, we have that
a =E
[
Xn+1(X1 − L{X2,...,Xn}(X1))]
E
[
(
X1 − L{X2,...,Xn}(X1))2] .
5. Note that (X1, . . . , Xn)t, (Xn, . . . , X1)t, and, (X2, . . . , Xn+1), for example, all have the same
variance-covariance matrix.
Since best linear prediction depends only on the variance covariance matrix, we then havethat
L{X2,...,Xn}(Xn+1) = L{X2,...,Xn}(X1)
since the lag differences are the same.
In our notation,
L{X2,...,Xn}(Xn+1) = bn−1,n−1X2 + bn−1,n−2X3 + · · · + bn−1,1Xn =n−1∑
i=1
bn−1,i Xn+1−i
and
L{X2,...,Xn}(X1) = bn−1,1X2 + bn−1,2X3 + · · · + bn−1,n−1Xn =n−1∑
i=1
bn−1,i Xi+1.
So,
E
[
(
X1 − L{X2,...,Xn}(X1))2]
= E
[
(
Xn+1 − L{X2,...,Xn}(Xn+1))2]
= E
[
(
Xn − L{X1,...,Xn−1}(Xn))2]
= vn−1
6. Therefore,
a =γX(n) − E
[
Xn+1∑n−1
i=1 bn−1,i Xi+1
]
vn−1=
[
γX(n) −n−1∑
i=1
bn−1,iγX(n − i)
]
· v−1n−1,
which is the formula given in Step One of the DL-algorithm!
7.Xn+1 = LA1(Xn+1) + a(X1 − LA1(X1))
= aX1 +∑n−1
i=1 (bn−1,i − abn−1,n−i) Xn+1−i.
Hey! Wait! We know that Xn+1 =∑n
i=1 bniXn+1−i.
8. Since Γn is invertible (since we assume here that γX(0) > 0 and that γX(n) → 0 as n → ∞),the two solutions for Xn+1 in (7.) are equal. Equating coefficients gives us
bnn = a, and bnj = bn−1,j − abn−1,n−i.
This is Step Two of the DL-algorithm.
9. Now
vn = E[(Xn+1 − Xn+1)2]
= E[(Xn+1 − LA1(Xn+1) − LA2(Xn+1))2]
= E[(Xn+1 − L{X2,...,Xn}(Xn+1))2] − 2E[(Xn+1 − LA1(Xn+1) · LA2(Xn+1))]
+E[(LA2(Xn+1))2]
= vn−1 − 2E[Xn+1LA2(Xn+1)] + E[a2(X1 − L{X2,...,Xn}(X1))2]
= vn−1 + a2vn−1 − 2E[Xn+1 · a(X1 − L{X2,...,Xn}(X1))]
= (1 + a2)vn−1 − 2aE[Xn+1(X1 − L{X2,...,Xn}(X1))].
But,
a = bnn =E[Xn+1(X1 − L{X2,...,Xn}(X1))]
vn−1
So,vn = (1 + a2)vn−1 − 2a · avn−1 = (1 − a2)vn−1 = (1 − b2
nn)vn−1
which is Step Three of the DL-algorithm!
The PACF Connection:
During the proof of the DL-algorithm, we saw that
bnn = a =E[Xn+1(X1 − L{X2,...,Xn}(X1))]
E[(X1 − L{X2,...,Xn}(X1))2],
and that this may be rewritten (see step 5 of DL-proof) as
=E[(Xn+1 − L{X2,...,Xn}(Xn+1))(X1 − L{X2,...,Xn}(X1))]
(E[(X1 − L{X2,...,Xn}(X1))])1/2 · (E[(Xn+1 − L{X2,...,Xn}(Xn+1))])1/2.
But this is the definition of
Corr((Xn+1 − L{X2,...,Xn}(Xn+1), X1 − L{X2,...,Xn}(X1))
which is the definition of αX(n), the PACF of {Xt} at lag n.
Example: AR(2), Xt = φ1Xt−1 +φ2Xt−2+Zt where {Zt} ∼ WN(0, σ2Z), and φ1 and φ2 are known
(we are doing prediction not estimation of parameters) and are such that the process is causal.
• We wish to recursively predict Xn+1 for n = 1, 2, . . ., based on previous values, and give theMSE of the predictions.
• We will need γX(0), γX (1), γX (2), . . ., but we can solve for them individually as needed. ie:We set up the standard equations by multiplying the AR equation by Xt−k and taking ex-pectations:
γX(0) − φ1γX(1) − φ2γX(2) = σ2Z
γX(1) − φ1γX(0) − φ2γX(1) = 0
γX(2) − φ1γX(1) − φ2γX(0) = 0
which give us
γX(0) =1 − φ2
(1 + φ2)(1 − (φ1 + φ2)(φ1 + 1 − φ2))σ2
Z ,
γX(1) =φ1
same denominatorσ2
Z ,
and
γX(2) =φ2
1 + φ2 − φ22
same denominatorσ2
Z .
Since, for k ≥ 2 we have
γX(k) = φ1γX(k − 1) + φ2γX(k − 2),
we can easily get additional γ’s as needed.
The DL-algorithm:
n = 1
b00 = 0, v0 = γX(0)
b11 = [γX(1)]v−10 =
γX(1)
γX(0)= ρX(1) =
φ1
1 − φ2
v1 = v0(1 − b211) = γX(0)
[
1 −φ2
1
(1 − φ2)2
]
So, the best linear predictor of X2 based on X1 is
X2 = b11X1 =φ1
1 − φ2X1
and the MSE of this predictor is v1.
n = 2
b22 = [γX(2) − b11γX(1)] v−11
=γX(2)−
γX (1)
γX (0)γX(1)
γX(0)
[
1−
(
γX (1)
γX (0)
)2]
=ρX(2)−ρ2
X(1)
1−ρ2X
(1)
= · · · = φ2
b21 = b11 − b22b11 =φ1
1 − φ2(1 − φ2) = φ1
v2 = v1(1 − b222) = γX(0)
[
1 −φ2
1
(1 − φ2)2
]
(1 − φ22)
So, the best linear predictor of X3 based on X1 and X2 is
X3 = b22X1 + b21X2 = φ2X1 + φ1X2
(Hmmm... is this surprising?) and the MSE associated with this prediction is v2.
n = 3
Continuing, we get
b33 = · · · =ρX(3) − φ1ρX(2) − φ2ρX(1)
1 − φ1ρX(1) − φ2ρX(2)
and now the reason for the transformation to this ρ-representation becomes apparent! That nu-merator is zero for this AR(2) model!
Now(
b31
b32
)
=
(
b21
b22
)
− b33
(
b22
b21
)
=
(
b21
b22
)
=
(
φ1
φ2
)
So, the best linear predictor of X4 given X1, X2, and X3 is
X4 = b33X1 + b32X2 + b31X3 = φ2X2 + φ1X3.
(Hmmm, also not surprising!)
In fact, for all future n, we will get bnn = 0 and
Xn = φ1Xn + φ2Xn−1.
Note that we also now know the PACF for this AR(2) model:
αX(1) = b11 =φ1
1 − φ2, αX(2) = b22 = φ2, αX(3) = b33 = 0
andαX(n) = bnn = 0 for n ≥ 2.