Durbin Levinson

The Durbin-Levinson Algorithm

Consider the problem of estimating the parameters of an AR(p) model.

• We already have the Yule-Walker equations

~φ = Γ−1p ~γp, σ2

Z = γX(0) − ~φt ~γp.

• The Durbin-Levinson algorithm provides an alternative that avoids the matrix inversion inthe Yule-Walker equations.

• It is actually a prediction algorithm. We will see that it also can be used for parameterestimation for the AR(p) model.

• Nice side effects of using the DL-algorithm: we will automatically get partial autocorrelationsand mean-squared errors associated with our predictions!

• The down-side is that it will not help us in our upcoming “dance war” with the mathstatsclass.

• It may or may not run on squirrels. This is still an open problem.

The DL-algorithm is an example of a recursive prediction algorithm.

• Suppose we predict Xn+1 from X1, X2, . . . , Xn.

• Suppose then that time goes by and we get to observe Xn+1, but now we want to predictXn+2 from X1, X2, . . . , Xn+1.

• We could

– start from “scratch”,

– or, we could use what we learned from predicting Xn+1 and update that somehow!

The setup for the DL-algorithm is a mean zero, (otherwise subtract the mean, predict, and add itback), stationary process {Xt} with covariance function γX(h).

Notation:

• The best linear predictor of Xn+1 given X1, X2, . . . , Xn :

Xn+1 = bnnX1 + bn,n−1X2 + · · · + bn1Xn =n∑

i=1

bniXn−i+1

• The mean squared prediction error is

vn = E

[

(

Xn+1 − Xn+1

)2]

We want to recursively compute the “best” b’s, and, at the same time, compute the v’s.

In the proof of the DL-algorithm, it becomes apparent why

bnn = αX(n) = the PACF at lag n.

Without further ado...

The Durbin-Levinson Algorithm

• Step Zero Set b00 = 0, v0 = γX(0), and n = 1.

• Step One Compute

bnn =

[

γX(n) −n−1∑

i=1

bn−1,iγX(n − i)

]

· v−1n−1.

• Step Two For n ≥ 2, compute

bn1...

bn,n−1

=

bn−1,1...

bn−1,n−1

− bnn

bn−1,n−1...

bn,1

.

• Step Three Compute

vn = vn−1(1 − b2nn)

Set n = n + 1 and return to Step One.

(Note: The DL-algorithm requires that γX(0) > 0 and that γX(n) → 0 as n → ∞.)

Proof:

1. Set A1 to be the span of {X2, . . . , Xn}. That is, let A1 be the set of all random variables thatcan be formed from linear combinations of X2, . . . , Xn.

Let A2 be the span of the single random variable X1−L{X2,...,Xn}(X1). Here, L{X2,...,Xn}(X1)is our usual notation for the best linear predictor of X1 based on X2, . . . , Xn. (As mentionedin class, it is the “projection of X1 onto the subspace generated by X2, . . . , Xn” and is morecommonly written as Psp{X2,...,Xn}(X1).)

Note: Xn+1 = L{X1,...,Xn}(Xn+1) if and only if

• Xn+1 is a linear combination of X1, . . . , Xn

• E[(Xn+1 − Xn+1)Xi] = 0 for i = 1, 2, . . . , n

(That second condition is from “the derivative set equal to zero” used in minimizing the MSEof the best linear predictor.)

2. Claim: A1 and A2 are “orthogonal” in the sense that if Y1 ∈ A1 and Y2 ∈ A2 then E[Y1Y2] = 0.

Proof of claim:

• Y1 ∈ A1 implies that Y1 has the form

Y1 = a2X2 + · · · + anXn.

• Y2 ∈ A2 implies that Y2 has the form

Y2 = a[

X1 − L{X2,...,Xn}(X1)]

• So,

E[Y1Y2] = E

[

(∑n

i=2 aiXi) · a(

X1 − L{X2,...,Xn}(X1))]

= a∑n

i=2 ai E

[

Xi

(

X1 − L{X2,...,Xn}(X1))]

= 0

because that expectation is zero for each i in the sum.

3. Note thatXn+1 = L{X1,...,Xn}(Xn+1) = LA1(Xn+1) + LA2(Xn+1)

= L{X2,...,Xn}(Xn+1) + a(

X1 − L{X2,...,Xn}(X1))

for some a ∈ IR.

4. In general, if we want to find the best linear predictor of a random variable Y based on arandom variable X:

Y = aX, we minimize E

[

(Y − aX)2]

with respect to a.

It is easy to show that a = E[XY ]/E[X2].

In our problem then, we have that

a =E

[

Xn+1(X1 − L{X2,...,Xn}(X1))]

E

[

(

X1 − L{X2,...,Xn}(X1))2] .

5. Note that (X1, . . . , Xn)t, (Xn, . . . , X1)t, and, (X2, . . . , Xn+1), for example, all have the same

variance-covariance matrix.

Since best linear prediction depends only on the variance covariance matrix, we then havethat

L{X2,...,Xn}(Xn+1) = L{X2,...,Xn}(X1)

since the lag differences are the same.

In our notation,

L{X2,...,Xn}(Xn+1) = bn−1,n−1X2 + bn−1,n−2X3 + · · · + bn−1,1Xn =n−1∑

i=1

bn−1,i Xn+1−i

and

L{X2,...,Xn}(X1) = bn−1,1X2 + bn−1,2X3 + · · · + bn−1,n−1Xn =n−1∑

i=1

bn−1,i Xi+1.

So,

E

[

(

X1 − L{X2,...,Xn}(X1))2]

= E

[

(

Xn+1 − L{X2,...,Xn}(Xn+1))2]

= E

[

(

Xn − L{X1,...,Xn−1}(Xn))2]

= vn−1

6. Therefore,

a =γX(n) − E

[

Xn+1∑n−1

i=1 bn−1,i Xi+1

]

vn−1=

[

γX(n) −n−1∑

i=1

bn−1,iγX(n − i)

]

· v−1n−1,

which is the formula given in Step One of the DL-algorithm!

7.Xn+1 = LA1(Xn+1) + a(X1 − LA1(X1))

= aX1 +∑n−1

i=1 (bn−1,i − abn−1,n−i) Xn+1−i.

Hey! Wait! We know that Xn+1 =∑n

i=1 bniXn+1−i.

8. Since Γn is invertible (since we assume here that γX(0) > 0 and that γX(n) → 0 as n → ∞),the two solutions for Xn+1 in (7.) are equal. Equating coefficients gives us

bnn = a, and bnj = bn−1,j − abn−1,n−i.

This is Step Two of the DL-algorithm.

9. Now

vn = E[(Xn+1 − Xn+1)2]

= E[(Xn+1 − LA1(Xn+1) − LA2(Xn+1))2]

= E[(Xn+1 − L{X2,...,Xn}(Xn+1))2] − 2E[(Xn+1 − LA1(Xn+1) · LA2(Xn+1))]

+E[(LA2(Xn+1))2]

= vn−1 − 2E[Xn+1LA2(Xn+1)] + E[a2(X1 − L{X2,...,Xn}(X1))2]

= vn−1 + a2vn−1 − 2E[Xn+1 · a(X1 − L{X2,...,Xn}(X1))]

= (1 + a2)vn−1 − 2aE[Xn+1(X1 − L{X2,...,Xn}(X1))].

But,

a = bnn =E[Xn+1(X1 − L{X2,...,Xn}(X1))]

vn−1

So,vn = (1 + a2)vn−1 − 2a · avn−1 = (1 − a2)vn−1 = (1 − b2

nn)vn−1

which is Step Three of the DL-algorithm!

The PACF Connection:

During the proof of the DL-algorithm, we saw that

bnn = a =E[Xn+1(X1 − L{X2,...,Xn}(X1))]

E[(X1 − L{X2,...,Xn}(X1))2],

and that this may be rewritten (see step 5 of DL-proof) as

=E[(Xn+1 − L{X2,...,Xn}(Xn+1))(X1 − L{X2,...,Xn}(X1))]

(E[(X1 − L{X2,...,Xn}(X1))])1/2 · (E[(Xn+1 − L{X2,...,Xn}(Xn+1))])1/2.

But this is the definition of

Corr((Xn+1 − L{X2,...,Xn}(Xn+1), X1 − L{X2,...,Xn}(X1))

which is the definition of αX(n), the PACF of {Xt} at lag n.

Example: AR(2), Xt = φ1Xt−1 +φ2Xt−2+Zt where {Zt} ∼ WN(0, σ2Z), and φ1 and φ2 are known

(we are doing prediction not estimation of parameters) and are such that the process is causal.

• We wish to recursively predict Xn+1 for n = 1, 2, . . ., based on previous values, and give theMSE of the predictions.

• We will need γX(0), γX (1), γX (2), . . ., but we can solve for them individually as needed. ie:We set up the standard equations by multiplying the AR equation by Xt−k and taking ex-pectations:

γX(0) − φ1γX(1) − φ2γX(2) = σ2Z

γX(1) − φ1γX(0) − φ2γX(1) = 0

γX(2) − φ1γX(1) − φ2γX(0) = 0

which give us

γX(0) =1 − φ2

(1 + φ2)(1 − (φ1 + φ2)(φ1 + 1 − φ2))σ2

Z ,

γX(1) =φ1

same denominatorσ2

Z ,

and

γX(2) =φ2

1 + φ2 − φ22

same denominatorσ2

Z .

Since, for k ≥ 2 we have

γX(k) = φ1γX(k − 1) + φ2γX(k − 2),

we can easily get additional γ’s as needed.

The DL-algorithm:

n = 1

b00 = 0, v0 = γX(0)

b11 = [γX(1)]v−10 =

γX(1)

γX(0)= ρX(1) =

φ1

1 − φ2

v1 = v0(1 − b211) = γX(0)

[

1 −φ2

1

(1 − φ2)2

]

So, the best linear predictor of X2 based on X1 is

X2 = b11X1 =φ1

1 − φ2X1

and the MSE of this predictor is v1.

n = 2

b22 = [γX(2) − b11γX(1)] v−11

=γX(2)−

γX (1)

γX (0)γX(1)

γX(0)

[

1−

(

γX (1)

γX (0)

)2]

=ρX(2)−ρ2

X(1)

1−ρ2X

(1)

= · · · = φ2

b21 = b11 − b22b11 =φ1

1 − φ2(1 − φ2) = φ1

v2 = v1(1 − b222) = γX(0)

[

1 −φ2

1

(1 − φ2)2

]

(1 − φ22)

So, the best linear predictor of X3 based on X1 and X2 is

X3 = b22X1 + b21X2 = φ2X1 + φ1X2

(Hmmm... is this surprising?) and the MSE associated with this prediction is v2.

n = 3

Continuing, we get

b33 = · · · =ρX(3) − φ1ρX(2) − φ2ρX(1)

1 − φ1ρX(1) − φ2ρX(2)

and now the reason for the transformation to this ρ-representation becomes apparent! That nu-merator is zero for this AR(2) model!

Now(

b31

b32

)

=

(

b21

b22

)

− b33

(

b22

b21

)

=

(

b21

b22

)

=

(

φ1

φ2

)

So, the best linear predictor of X4 given X1, X2, and X3 is

X4 = b33X1 + b32X2 + b31X3 = φ2X2 + φ1X3.

(Hmmm, also not surprising!)

In fact, for all future n, we will get bnn = 0 and

Xn = φ1Xn + φ2Xn−1.

Note that we also now know the PACF for this AR(2) model:

αX(1) = b11 =φ1

1 − φ2, αX(2) = b22 = φ2, αX(3) = b33 = 0

andαX(n) = bnn = 0 for n ≥ 2.

Documents

Durbin Levinson