7
The Durbin-Levinson Algorithm Consider the problem of estimating the parameters of an AR(p) model. We already have the Yule-Walker equations ˆ ~ φ = ˆ Γ -1 p ˆ p , ˆ σ 2 Z γ X (0) - ˆ ~ φ t ˆ p . The Durbin-Levinson algorithm provides an alternative that avoids the matrix inversion in the Yule-Walker equations. It is actually a prediction algorithm. We will see that it also can be used for parameter estimation for the AR(p) model. Nice side effects of using the DL-algorithm: we will automatically get partial autocorrelations and mean-squared errors associated with our predictions! The down-side is that it will not help us in our upcoming “dance war” with the mathstats class. It may or may not run on squirrels. This is still an open problem. The DL-algorithm is an example of a recursive prediction algorithm. Suppose we predict X n+1 from X 1 ,X 2 ,...,X n . Suppose then that time goes by and we get to observe X n+1 , but now we want to predict X n+2 from X 1 ,X 2 ,...,X n+1 . We could start from “scratch”, or, we could use what we learned from predicting X n+1 and update that somehow! The setup for the DL-algorithm is a mean zero, (otherwise subtract the mean, predict, and add it back), stationary process {X t } with covariance function γ X (h). Notation: The best linear predictor of X n+1 given X 1 ,X 2 ,...,X n : ˆ X n+1 = b nn X 1 + b n,n-1 X 2 + ··· + b n1 X n = n X i=1 b ni X n-i+1 The mean squared prediction error is v n = E X n+1 - ˆ X n+1 2

Durbin Levinson

Embed Size (px)

Citation preview

Page 1: Durbin Levinson

The Durbin-Levinson Algorithm

Consider the problem of estimating the parameters of an AR(p) model.

• We already have the Yule-Walker equations

~φ = Γ−1p ~γp, σ2

Z = γX(0) − ~φt ~γp.

• The Durbin-Levinson algorithm provides an alternative that avoids the matrix inversion inthe Yule-Walker equations.

• It is actually a prediction algorithm. We will see that it also can be used for parameterestimation for the AR(p) model.

• Nice side effects of using the DL-algorithm: we will automatically get partial autocorrelationsand mean-squared errors associated with our predictions!

• The down-side is that it will not help us in our upcoming “dance war” with the mathstatsclass.

• It may or may not run on squirrels. This is still an open problem.

The DL-algorithm is an example of a recursive prediction algorithm.

• Suppose we predict Xn+1 from X1, X2, . . . , Xn.

• Suppose then that time goes by and we get to observe Xn+1, but now we want to predictXn+2 from X1, X2, . . . , Xn+1.

• We could

– start from “scratch”,

– or, we could use what we learned from predicting Xn+1 and update that somehow!

The setup for the DL-algorithm is a mean zero, (otherwise subtract the mean, predict, and add itback), stationary process {Xt} with covariance function γX(h).

Notation:

• The best linear predictor of Xn+1 given X1, X2, . . . , Xn :

Xn+1 = bnnX1 + bn,n−1X2 + · · · + bn1Xn =n∑

i=1

bniXn−i+1

• The mean squared prediction error is

vn = E

[

(

Xn+1 − Xn+1

)2]

Page 2: Durbin Levinson

We want to recursively compute the “best” b’s, and, at the same time, compute the v’s.

In the proof of the DL-algorithm, it becomes apparent why

bnn = αX(n) = the PACF at lag n.

Without further ado...

The Durbin-Levinson Algorithm

• Step Zero Set b00 = 0, v0 = γX(0), and n = 1.

• Step One Compute

bnn =

[

γX(n) −n−1∑

i=1

bn−1,iγX(n − i)

]

· v−1n−1.

• Step Two For n ≥ 2, compute

bn1...

bn,n−1

=

bn−1,1...

bn−1,n−1

− bnn

bn−1,n−1...

bn,1

.

• Step Three Compute

vn = vn−1(1 − b2nn)

Set n = n + 1 and return to Step One.

(Note: The DL-algorithm requires that γX(0) > 0 and that γX(n) → 0 as n → ∞.)

Proof:

1. Set A1 to be the span of {X2, . . . , Xn}. That is, let A1 be the set of all random variables thatcan be formed from linear combinations of X2, . . . , Xn.

Let A2 be the span of the single random variable X1−L{X2,...,Xn}(X1). Here, L{X2,...,Xn}(X1)is our usual notation for the best linear predictor of X1 based on X2, . . . , Xn. (As mentionedin class, it is the “projection of X1 onto the subspace generated by X2, . . . , Xn” and is morecommonly written as Psp{X2,...,Xn}(X1).)

Note: Xn+1 = L{X1,...,Xn}(Xn+1) if and only if

Page 3: Durbin Levinson

• Xn+1 is a linear combination of X1, . . . , Xn

• E[(Xn+1 − Xn+1)Xi] = 0 for i = 1, 2, . . . , n

(That second condition is from “the derivative set equal to zero” used in minimizing the MSEof the best linear predictor.)

2. Claim: A1 and A2 are “orthogonal” in the sense that if Y1 ∈ A1 and Y2 ∈ A2 then E[Y1Y2] = 0.

Proof of claim:

• Y1 ∈ A1 implies that Y1 has the form

Y1 = a2X2 + · · · + anXn.

• Y2 ∈ A2 implies that Y2 has the form

Y2 = a[

X1 − L{X2,...,Xn}(X1)]

• So,

E[Y1Y2] = E

[

(∑n

i=2 aiXi) · a(

X1 − L{X2,...,Xn}(X1))]

= a∑n

i=2 ai E

[

Xi

(

X1 − L{X2,...,Xn}(X1))]

= 0

because that expectation is zero for each i in the sum.

3. Note thatXn+1 = L{X1,...,Xn}(Xn+1) = LA1(Xn+1) + LA2(Xn+1)

= L{X2,...,Xn}(Xn+1) + a(

X1 − L{X2,...,Xn}(X1))

for some a ∈ IR.

4. In general, if we want to find the best linear predictor of a random variable Y based on arandom variable X:

Y = aX, we minimize E

[

(Y − aX)2]

with respect to a.

It is easy to show that a = E[XY ]/E[X2].

In our problem then, we have that

a =E

[

Xn+1(X1 − L{X2,...,Xn}(X1))]

E

[

(

X1 − L{X2,...,Xn}(X1))2] .

Page 4: Durbin Levinson

5. Note that (X1, . . . , Xn)t, (Xn, . . . , X1)t, and, (X2, . . . , Xn+1), for example, all have the same

variance-covariance matrix.

Since best linear prediction depends only on the variance covariance matrix, we then havethat

L{X2,...,Xn}(Xn+1) = L{X2,...,Xn}(X1)

since the lag differences are the same.

In our notation,

L{X2,...,Xn}(Xn+1) = bn−1,n−1X2 + bn−1,n−2X3 + · · · + bn−1,1Xn =n−1∑

i=1

bn−1,i Xn+1−i

and

L{X2,...,Xn}(X1) = bn−1,1X2 + bn−1,2X3 + · · · + bn−1,n−1Xn =n−1∑

i=1

bn−1,i Xi+1.

So,

E

[

(

X1 − L{X2,...,Xn}(X1))2]

= E

[

(

Xn+1 − L{X2,...,Xn}(Xn+1))2]

= E

[

(

Xn − L{X1,...,Xn−1}(Xn))2]

= vn−1

6. Therefore,

a =γX(n) − E

[

Xn+1∑n−1

i=1 bn−1,i Xi+1

]

vn−1=

[

γX(n) −n−1∑

i=1

bn−1,iγX(n − i)

]

· v−1n−1,

which is the formula given in Step One of the DL-algorithm!

7.Xn+1 = LA1(Xn+1) + a(X1 − LA1(X1))

= aX1 +∑n−1

i=1 (bn−1,i − abn−1,n−i) Xn+1−i.

Hey! Wait! We know that Xn+1 =∑n

i=1 bniXn+1−i.

8. Since Γn is invertible (since we assume here that γX(0) > 0 and that γX(n) → 0 as n → ∞),the two solutions for Xn+1 in (7.) are equal. Equating coefficients gives us

bnn = a, and bnj = bn−1,j − abn−1,n−i.

This is Step Two of the DL-algorithm.

9. Now

vn = E[(Xn+1 − Xn+1)2]

= E[(Xn+1 − LA1(Xn+1) − LA2(Xn+1))2]

= E[(Xn+1 − L{X2,...,Xn}(Xn+1))2] − 2E[(Xn+1 − LA1(Xn+1) · LA2(Xn+1))]

+E[(LA2(Xn+1))2]

= vn−1 − 2E[Xn+1LA2(Xn+1)] + E[a2(X1 − L{X2,...,Xn}(X1))2]

Page 5: Durbin Levinson

= vn−1 + a2vn−1 − 2E[Xn+1 · a(X1 − L{X2,...,Xn}(X1))]

= (1 + a2)vn−1 − 2aE[Xn+1(X1 − L{X2,...,Xn}(X1))].

But,

a = bnn =E[Xn+1(X1 − L{X2,...,Xn}(X1))]

vn−1

So,vn = (1 + a2)vn−1 − 2a · avn−1 = (1 − a2)vn−1 = (1 − b2

nn)vn−1

which is Step Three of the DL-algorithm!

The PACF Connection:

During the proof of the DL-algorithm, we saw that

bnn = a =E[Xn+1(X1 − L{X2,...,Xn}(X1))]

E[(X1 − L{X2,...,Xn}(X1))2],

and that this may be rewritten (see step 5 of DL-proof) as

=E[(Xn+1 − L{X2,...,Xn}(Xn+1))(X1 − L{X2,...,Xn}(X1))]

(E[(X1 − L{X2,...,Xn}(X1))])1/2 · (E[(Xn+1 − L{X2,...,Xn}(Xn+1))])1/2.

But this is the definition of

Corr((Xn+1 − L{X2,...,Xn}(Xn+1), X1 − L{X2,...,Xn}(X1))

which is the definition of αX(n), the PACF of {Xt} at lag n.

Example: AR(2), Xt = φ1Xt−1 +φ2Xt−2+Zt where {Zt} ∼ WN(0, σ2Z), and φ1 and φ2 are known

(we are doing prediction not estimation of parameters) and are such that the process is causal.

• We wish to recursively predict Xn+1 for n = 1, 2, . . ., based on previous values, and give theMSE of the predictions.

• We will need γX(0), γX (1), γX (2), . . ., but we can solve for them individually as needed. ie:We set up the standard equations by multiplying the AR equation by Xt−k and taking ex-pectations:

γX(0) − φ1γX(1) − φ2γX(2) = σ2Z

γX(1) − φ1γX(0) − φ2γX(1) = 0

γX(2) − φ1γX(1) − φ2γX(0) = 0

Page 6: Durbin Levinson

which give us

γX(0) =1 − φ2

(1 + φ2)(1 − (φ1 + φ2)(φ1 + 1 − φ2))σ2

Z ,

γX(1) =φ1

same denominatorσ2

Z ,

and

γX(2) =φ2

1 + φ2 − φ22

same denominatorσ2

Z .

Since, for k ≥ 2 we have

γX(k) = φ1γX(k − 1) + φ2γX(k − 2),

we can easily get additional γ’s as needed.

The DL-algorithm:

n = 1

b00 = 0, v0 = γX(0)

b11 = [γX(1)]v−10 =

γX(1)

γX(0)= ρX(1) =

φ1

1 − φ2

v1 = v0(1 − b211) = γX(0)

[

1 −φ2

1

(1 − φ2)2

]

So, the best linear predictor of X2 based on X1 is

X2 = b11X1 =φ1

1 − φ2X1

and the MSE of this predictor is v1.

n = 2

b22 = [γX(2) − b11γX(1)] v−11

=γX(2)−

γX (1)

γX (0)γX(1)

γX(0)

[

1−

(

γX (1)

γX (0)

)2]

=ρX(2)−ρ2

X(1)

1−ρ2X

(1)

= · · · = φ2

b21 = b11 − b22b11 =φ1

1 − φ2(1 − φ2) = φ1

Page 7: Durbin Levinson

v2 = v1(1 − b222) = γX(0)

[

1 −φ2

1

(1 − φ2)2

]

(1 − φ22)

So, the best linear predictor of X3 based on X1 and X2 is

X3 = b22X1 + b21X2 = φ2X1 + φ1X2

(Hmmm... is this surprising?) and the MSE associated with this prediction is v2.

n = 3

Continuing, we get

b33 = · · · =ρX(3) − φ1ρX(2) − φ2ρX(1)

1 − φ1ρX(1) − φ2ρX(2)

and now the reason for the transformation to this ρ-representation becomes apparent! That nu-merator is zero for this AR(2) model!

Now(

b31

b32

)

=

(

b21

b22

)

− b33

(

b22

b21

)

=

(

b21

b22

)

=

(

φ1

φ2

)

So, the best linear predictor of X4 given X1, X2, and X3 is

X4 = b33X1 + b32X2 + b31X3 = φ2X2 + φ1X3.

(Hmmm, also not surprising!)

In fact, for all future n, we will get bnn = 0 and

Xn = φ1Xn + φ2Xn−1.

Note that we also now know the PACF for this AR(2) model:

αX(1) = b11 =φ1

1 − φ2, αX(2) = b22 = φ2, αX(3) = b33 = 0

andαX(n) = bnn = 0 for n ≥ 2.