Introduction to Time Series Analysis. Lecture 15. · Introduction to Time Series Analysis. Lecture 16. 1. Review: Spectral density 2. Examples 3. Spectral distribution function. 4

Introduction to Time Series Analysis. Lecture 15.

Spectral Analysis

1. Spectral density: Facts and examples.

2. Spectral distribution function.

3. Wold’s decomposition.

1

Spectral Analysis

Idea: decompose a stationary time series{Xt} into a combination of

sinusoids, with random (and uncorrelated) coefficients.

Just as in Fourier analysis, where we decompose (deterministic) functions

into combinations of sinusoids.

This is referred to as ‘spectral analysis’ or analysis in the‘frequency

domain,’ in contrast to the time domain approach we have considered so far.

The frequency domain approach considers regression on sinusoids; the time

domain approach considers regression on past values of the time series.

2

A periodic time series

Consider

Xt = A sin(2πνt) + B cos(2πνt)

= C sin(2πνt + φ),

whereA, B are uncorrelated, mean zero, varianceσ2 = 1, and

C2 = A2 + B2, tan φ = B/A. Then

µt = E[Xt] = 0

γ(t, t + h) = cos(2πνh).

So{Xt} is stationary.

3

An aside: Some trigonometric identities

tan θ =sin θ

cos θ,

sin2 θ + cos2 θ = 1,

sin(a + b) = sin a cos b + cos a sin b,

cos(a + b) = cos a cos b − sin a sin b.

4


ForXt = A sin(2πνt) + B cos(2πνt), with uncorrelatedA, B

(mean 0, varianceσ2), γ(h) = σ2 cos(2πνh).

The autocovariance of the sum of two uncorrelated time series is the sum of

their autocovariances. Thus, the autocovariance of a sum ofrandom

sinusoids is a sum of sinusoids with the corresponding frequencies:

Xt =k∑

j=1

(Aj sin(2πνjt) + Bj cos(2πνjt)) ,

γ(h) =k∑

j=1

σ2j cos(2πνjh),

whereAj , Bj are uncorrelated, mean zero, and Var(Aj) = Var(Bj) = σ2j .

5


Xt =

k∑

j=1

(Aj sin(2πνjt) + Bj cos(2πνjt)) , γ(h) =

k∑

j=1

σ2j cos(2πνjh).

Thus, we can representγ(h) using a Fourier series. The coefficients are the

variances of the sinusoidal components.

Thespectral density is the continuous analog: the Fourier transform ofγ.

(The analogousspectral representation of a stationary processXt involves

a stochastic integral—a sum of discrete components at a finite number of

frequencies is a special case. We won’t consider this representation in this

course.)

6

Spectral density

If a time series {Xt} has autocovarianceγ satisfying∑∞

h=−∞ |γ(h)| < ∞, then we define itsspectral density as

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh

for −∞ < ν < ∞.

7

Spectral density: Some facts

1. We have∑∞

h=−∞

∣

∣γ(h)e−2πiνh∣

∣ < ∞.

This is because|eiθ| = | cos θ + i sin θ| = (cos2 θ + sin2 θ)1/2 = 1,

and because of the absolute summability ofγ.

2. f is periodic, with period1.

This is true sincee−2πiνh is a periodic function ofν with period1.

Thus, we can restrict the domain off to−1/2 ≤ ν ≤ 1/2. (The text

does this.)

8


3. f is even (that is,f(ν) = f(−ν)).To see this, write

f(ν) =−1∑

h=−∞

γ(h)e−2πiνh + γ(0) +∞∑

h=1

γ(h)e−2πiνh,

f(−ν) =−1∑

h=−∞

γ(h)e−2πiν(−h) + γ(0) +∞∑

h=1

γ(h)e−2πiν(−h),

=

∞∑

h=1

γ(−h)e−2πiνh + γ(0) +

−1∑

h=−∞

γ(−h)e−2πiνh

= f(ν).

4. f(ν) ≥ 0.

9


5. γ(h) =

∫ 1/2

−1/2

e2πiνhf(ν) dν.

∫ 1/2

−1/2

e2πiνhf(ν) dν =

∫ 1/2

−1/2

∞∑

j=−∞

e−2πiν(j−h)γ(j) dν

=

∞∑

j=−∞

γ(j)

∫ 1/2

−1/2

e−2πiν(j−h) dν

= γ(h) +∑

j 6=h

γ(j)

2πi(j − h)

(

eπi(j−h) − e−πi(j−h))

= γ(h) +∑

j 6=h

γ(j) sin(π(j − h))

π(j − h)= γ(h).

10

Example: White noise

For white noise{Wt}, we have seen thatγ(0) = σ2w andγ(h) = 0 for

h 6= 0.

Thus,

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh

= γ(0) = σ2w.

That is, the spectral density is constant across all frequencies: each

frequency in the spectrum contributes equally to the variance. This is the

origin of the namewhite noise: it is like white light, which is a uniform

mixture of all frequencies in the visible spectrum.

11

Example: AR(1)

ForXt = φ1Xt−1 + Wt, we have seen thatγ(h) = σ2wφ

|h|1 /(1− φ2

1). Thus,

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh =σ2

w

1 − φ21

∞∑

h=−∞

φ|h|1 e−2πiνh

=σ2

w

1 − φ21

(

1 +

∞∑

h=1

φh1

(

e−2πiνh + e2πiνh)

)

=σ2

w

1 − φ21

(

1 +φ1e

−2πiν

1 − φ1e−2πiν+

φ1e2πiν

1 − φ1e2πiν

)

=σ2

w

(1 − φ21)

1 − φ1e−2πiνφ1e

2πiν

(1 − φ1e−2πiν)(1 − φ1e2πiν)

=σ2

w

1 − 2φ1 cos(2πν) + φ21

.

12

Examples

White noise:{Wt}, γ(0) = σ2w andγ(h) = 0 for h 6= 0.

f(ν) = γ(0) = σ2w.

AR(1): Xt = φ1Xt−1 + Wt, γ(h) = σ2wφ

|h|1 /(1 − φ2

1).

f(ν) =σ2

w

1−2φ1 cos(2πν)+φ2

1

.

If φ1 > 0 (positive autocorrelation), spectrum is dominated by low

frequency components—smooth in the time domain.

If φ1 < 0 (negative autocorrelation), spectrum is dominated by high

frequency components—rough in the time domain.

13

Example: AR(1)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

10

20

30

40

50

60

70

80

90

100

ν

f(ν)

Spectral density of AR(1): Xt = +0.9 X

t−1 + W

t

14

Example: AR(1)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

10

20

30

40

50

60

70

80

90

100

ν

f(ν)

Spectral density of AR(1): Xt = −0.9 X

t−1 + W

t

15


1. Review: Spectral density

2. Examples


4. Autocovariance generating function and spectral density.

1

Review: Spectral density


h=−∞ |γ(h)| <∞, then we define itsspectral density as

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh

for −∞ < ν <∞.

2


1. f(ν) is real.

2. f(ν) ≥ 0.

3. f is periodic, with period1. So we restrict the domain off to

−1/2 ≤ ν ≤ 1/2.

4. f is even (that is,f(ν) = f(−ν)).

5. γ(h) =∫ 1/2

−1/2

e2πiνhf(ν) dν.

3

Examples

White noise:{Wt}, γ(0) = σ2w andγ(h) = 0 for h 6= 0.

f(ν) = γ(0) = σ2w.

AR(1): Xt = φ1Xt−1 +Wt, γ(h) = σ2wφ

|h|1 /(1− φ21).

f(ν) =σ2

w

1−2φ1 cos(2πν)+φ2

1

.

If φ1 > 0 (positive autocorrelation), spectrum is dominated by low


If φ1 < 0 (negative autocorrelation), spectrum is dominated by high


4

Example: AR(1)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

10

20

30

40

50

60

70

80

90

100

ν

f(ν)

Spectral density of AR(1): Xt = +0.9 X

t−1 + W

t

5

Example: AR(1)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

10

20

30

40

50

60

70

80

90

100

ν

f(ν)

Spectral density of AR(1): Xt = −0.9 X

t−1 + W

t

6

Example: MA(1)

Xt =Wt + θ1Wt−1.

γ(h) =

σ2w(1 + θ21) if h = 0,

σ2wθ1 if |h| = 1,

0 otherwise.

f(ν) =1

∑

h=−1

γ(h)e−2πiνh

= γ(0) + 2γ(1) cos(2πν)

= σ2w

(

1 + θ21 + 2θ1 cos(2πν))

.

7

Example: MA(1)

Xt =Wt + θ1Wt−1.

f(ν) = σ2w

(

1 + θ21 + 2θ1 cos(2πν))

.

If θ1 > 0 (positive autocorrelation), spectrum is dominated by low


If θ1 < 0 (negative autocorrelation), spectrum is dominated by high


8

Example: MA(1)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.5

1

1.5

2

2.5

3

3.5

4

ν

f(ν)

Spectral density of MA(1): Xt = W

t +0.9 W

t−1

9

Example: MA(1)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.5

1

1.5

2

2.5

3

3.5

4

ν

f(ν)

Spectral density of MA(1): Xt = W

t −0.9 W

t−1

10



2. Examples



5. Rational spectra. Poles and zeros.

11

Recall: A periodic time series

Xt =

k∑

j=1

(Aj sin(2πνjt) +Bj cos(2πνjt))

=k∑

j=1

(A2j + B2

j )1/2 sin(2πνjt+ tan−1(Bj/Aj)).

E[Xt] = 0

γ(h) =k∑

j=1

σ2j cos(2πνjh)

∑

h

|γ(h)| = ∞.

12

Discrete spectral distribution function

ForXt = A sin(2πλt) +B cos(2πλt), we haveγ(h) = σ2 cos(2πλh), and

we can write

γ(h) =

∫ 1/2

−1/2

e2πiνhdF (ν),

whereF is the discrete distribution

F (ν) =

0 if ν < −λ,σ2

2 if −λ ≤ ν < λ,

σ2 otherwise.

13

The spectral distribution function

For any stationary{Xt} with autocovarianceγ, we can write

γ(h) =

∫ 1/2

−1/2

e2πiνhdF (ν),

whereF is thespectral distribution function of {Xt}.

We can splitF into three components: discrete, continuous, and singular.

If γ is absolutely summable,F is continuous:dF (ν) = f(ν)dν.

If γ is a sum of sinusoids,F is discrete.

14

The spectral distribution function

ForXt =∑k

j=1 (Aj sin(2πνjt) +Bj cos(2πνjt)), the spectral distribution

function isF (ν) =∑k

j=1 σ2jFj(ν), where

Fj(ν) =

0 if ν < −νj ,12 if −νj ≤ ν < νj ,

1 otherwise.

15

Wold’s decomposition

Notice thatXt =∑k

j=1 (Aj sin(2πνjt) +Bj cos(2πνjt)) is deterministic

(once we’ve seen the past, we can predict the future without error).

Wold showed that every stationary process can be represented as

Xt = X(d)t +X

(n)t ,

whereX(d)t is purely deterministic andX(n)

t is purely nondeterministic.

(c.f. the decomposition of a spectral distribution function asF (d) + F (c).)

Example:Xt = A sin(2πλt) + θ(B)φ(B)Wt.

16



2. Examples



17

Autocovariance generating function and spectral density

SupposeXt is a linear process, so it can be written

Xt =∑∞

i=0 ψiWt−i = ψ(B)Wt.

Consider the autocovariance sequence,

γh = Cov(Xt, Xt+h)

= E

∞∑

i=0

ψiWt−i

∞∑

j=0

ψjWt+h−j

= σ2w

∞∑

i=0

ψiψi+h.

18


Define the autocovariance generating function as

γ(B) =∞∑

h=−∞

γhBh.

Then, γ(B) = σ2w

∞∑

h=−∞

∞∑

i=0

ψiψi+hBh

= σ2w

∞∑

i=0

∞∑

j=0

ψiψjBj−i

= σ2w

∞∑

i=0

ψiB−i

∞∑

j=0

ψjBj = σ2

wψ(B−1)ψ(B).

19


Notice that

γ(B) =

∞∑

h=−∞

γhBh.

f(ν) =∞∑

h=−∞

γhe−2πiνh

= γ(

e−2πiν)

= σ2wψ

(

e−2πiν)

ψ(

e2πiν)

= σ2w

∣

∣ψ(

e2πiν)∣

∣

2.

20


For example, for an MA(q), we haveψ(B) = θ(B), so

f(ν) = σ2wθ

(

e−2πiν)

θ(

e2πiν)

= σ2w

∣

∣θ(

e−2πiν)∣

∣

2.

For MA(1),

f(ν) = σ2w

∣

∣1 + θ1e−2πiν

∣

∣

2

= σ2w |1 + θ1 cos(−2πν) + iθ1 sin(−2πν)|

2

= σ2w

(

1 + 2θ1 cos(2πν) + θ21)

.

21


For an AR(p), we haveψ(B) = 1/φ(B), so

f(ν) =σ2w

φ (e−2πiν)φ (e2πiν)

=σ2w

|φ (e−2πiν)|2 .

For AR(1),

f(ν) =σ2w

|1− φ1e−2πiν |2

=σ2w

1− 2φ1 cos(2πν) + φ21.

22

Spectral density of a linear process

If Xt is a linear process, it can be writtenXt =∑∞


Then

f(ν) = σ2w

∣

∣ψ(

e−2πiν)∣

∣

2.

That is, the spectral densityf(ν) of a linear process measures the modulus

of theψ (MA(∞)) polynomial at the pointe2πiν on the unit circle.

23


1. Review: Spectral distribution function, spectral density.


3. Examples.

4. Time-invariant linear filters

5. Frequency response

1

Review: Spectral density and spectral distribution function


h=−∞|γ(h)| <∞, then we define itsspectral densityas

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh

for −∞ < ν <∞. We have

γ(h) =

∫ 1/2

−1/2

e2πiνhf(ν) dν =

∫ 1/2

−1/2

e2πiνh dF (ν),

wheredF (ν) = f(ν)dν.

f measures how the variance ofXt is distributed across the spectrum.

2

Review: Spectral density of a linear process

If Xt is a linear process, it can be writtenXt =∑∞


Then

f(ν) = σ2w

∣

∣ψ(

e−2πiν)∣

∣

2.

That is, the spectral densityf(ν) of a linear process measures the modulus

of theψ (MA(∞)) polynomial at the pointe2πiν on the unit circle.

3

Spectral density of a linear process

For an ARMA(p,q),ψ(B) = θ(B)/φ(B), so

f(ν) = σ2w

θ(e−2πiν)θ(e2πiν)

φ (e−2πiν)φ (e2πiν)

= σ2w

∣

∣

∣

∣

θ(e−2πiν)

φ (e−2πiν)

∣

∣

∣

∣

2

.

This is known as arational spectrum.

4

Rational spectra

Consider the factorization ofθ andφ as

θ(z) = θq(z − z1)(z − z2) · · · (z − zq)

φ(z) = φp(z − p1)(z − p2) · · · (z − pp),

wherez1, . . . , zq andp1, . . . , pp are called thezerosandpoles.

f(ν) = σ2w

∣

∣

∣

∣

∣

θq∏q

j=1(e−2πiν − zj)

φp∏p

j=1(e−2πiν − pj)

∣

∣

∣

∣

∣

2

= σ2w

θ2q∏q

j=1

∣

∣e−2πiν − zj∣

∣

2

φ2p∏p

j=1 |e−2πiν − pj |

2 .

5

Rational spectra

f(ν) = σ2w

θ2q∏q

j=1

∣


∣

2

φ2p∏p

j=1 |e−2πiν − pj |

2 .

As ν varies from0 to 1/2, e−2πiν moves clockwise around the unit circle

from 1 to e−πi = −1.

And the value off(ν) goes up as this point moves closer to (further from)

the polespj (zeroszj).

6

Example: ARMA

Recall AR(1):φ(z) = 1− φ1z. The pole is at1/φ1. If φ1 > 0, the pole is

to the right of1, so the spectral density decreases asν moves away from0.

If φ1 < 0, the pole is to the left of−1, so the spectral density is at its

maximum whenν = 0.5.

Recall MA(1): θ(z) = 1 + θ1z. The zero is at−1/θ1. If θ1 > 0, the zero is

to the left of−1, so the spectral density decreases asν moves towards−1.

If θ1 < 0, the zero is to the right of1, so the spectral density is at its

minimum whenν = 0.

7

Example: AR(2)

ConsiderXt = φ1Xt−1 + φ2Xt−2 +Wt. Example 4.6 in the text considers

this model withφ1 = 1, φ2 = −0.9, andσ2w = 1. In this case, the poles are

atp1, p2 ≈ 0.5555± i0.8958 ≈ 1.054e±i1.01567 ≈ 1.054e±2πi0.16165.

Thus, we have

f(ν) =σ2w

φ22|e−2πiν − p1|2|e−2πiν − p2|2

,

and this gets very peaked whene−2πiν passes near1.054e−2πi0.16165.

8

Example: AR(2)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

20

40

60

80

100

120

140

ν

f(ν)

Spectral density of AR(2): Xt = X

t−1 − 0.9 X

t−2 + W

t

9

Example: Seasonal ARMA

ConsiderXt = Φ1Xt−12 +Wt.

ψ(B) =1

1− Φ1B12,

f(ν) = σ2w

1

(1− Φ1e−2πi12ν)(1− Φ1e2πi12ν)

= σ2w

1

1− 2Φ1 cos(24πν) + Φ21

.

Notice thatf(ν) is periodic with period1/12.

10


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

ν

f(ν)

Spectral density of AR(1)12

: Xt = +0.2 X

t−12 + W

t

11


Another view:

1− Φ1z12 = 0 ⇔ z = reiθ,

with r = |Φ1|−1/12, ei12θ = e−i arg(Φ1).

ForΦ1 > 0, the twelve poles are at|Φ1|−1/12eikπ/6 for

k = 0,±1, . . . ,±5, 6.

So the spectral density gets peaked ase−2πiν passes near

|Φ1|−1/12 ×

{

1, e−iπ/6, e−iπ/3, e−iπ/2, e−i2π/3, e−i5π/6,−1}

.

12

Example: Multiplicative seasonal ARMA

Consider(1− Φ1B12)(1− φ1B)Xt =Wt.

f(ν) = σ2w

1

(1− 2Φ1 cos(24πν) + Φ21)(1− 2φ1 cos(2πν) + φ21)

.

This is a scaled product of the AR(1) spectrum and the (periodic) AR(1)12spectrum.

The AR(1)12 poles give peaks whene−2πiν is at one of the 12th roots of1;

the AR(1) poles give a peak neare−2πiν = 1.

13

Example: Multiplicative seasonal ARMA

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

1

2

3

4

5

6

7

ν

f(ν)

Spectral density of AR(1)AR(1)12

: (1+0.5 B)(1+0.2 B12) Xt = W

t

14




3. Examples.



15

Time-invariant linear filters

A filter is an operator; given a time series{Xt}, it maps to a time series{Yt}. We can think of a linear processXt =

∑∞

j=0 ψjWt−j as the output ofacausal linear filterwith a white noise input.

A time series{Yt} is the output of a linear filter

A = {at,j : t, j ∈ Z} with input{Xt} if

Yt =∞∑

j=−∞

at,jXj .

If at,t−j is independent oft (at,t−j = ψj), then we say that the

filter is time-invariant.

If ψj = 0 for j < 0, we say the filterψ is causal.

We’ll see that the name ‘filter’ arises from the frequency domain viewpoint.

16

Time-invariant linear filters: Examples

1. Yt = X−t is linear, but not time-invariant.

2. Yt = 13 (Xt−1 +Xt +Xt+1) is linear, time-invariant, but not causal:

ψj =

13 if |j| ≤ 1,

0 otherwise.

3. For polynomialsφ(B), θ(B) with roots outside the unit circle,

ψ(B) = θ(B)/φ(B) is a linear, time-invariant, causal filter.

17


The operation∞∑

j=−∞

ψjXt−j

is called theconvolutionof X with ψ.

18


The sequenceψ is also called theimpulse response, since the output{Yt} of

the linear filter in response to aunit impulse,

Xt =

1 if t = 0,

0 otherwise,

is

Yt = ψ(B)Xt =∞∑

j=−∞

ψjXt−j = ψt.

19




3. Examples.



20

Frequency response of a time-invariant linear filter

Suppose that{Xt} has spectral densityfx(ν) andψ is stable, that is,∑∞

j=−∞|ψj | <∞. ThenYt = ψ(B)Xt has spectral density

fy(ν) =∣

∣ψ(

e2πiν)∣

∣

2fx(ν).

The functionν 7→ ψ(e2πiν) (the polynomialψ(z) evaluated on the unit

circle) is known as thefrequency responseor transfer functionof the linear

filter.

The squared modulus,ν 7→ |ψ(e2πiν)|2 is known as thepower transfer

functionof the filter.

21


For stableψ, Yt = ψ(B)Xt has spectral density

fy(ν) =∣

∣ψ(

e2πiν)∣

∣

2fx(ν).

We have seen that a linear process,Yt = ψ(B)Wt, is a special case, since

fy(ν) = |ψ(e2πiν)|2σ2w = |ψ(e2πiν)|2fw(ν).

When we pass a time series{Xt} through a linear filter, the spectral density

is multiplied, frequency-by-frequency, by the squared modulus of the

frequency responseν 7→ |ψ(e2πiν)|2.

This is a version of the equality Var(aX) = a2Var(X), but the equality is

true for the component of the variance at every frequency.

This is also the origin of the name ‘filter.’

22

Frequency response of a filter: Details

Why isfy(ν) =∣

∣ψ(

e2πiν)∣

∣

2fx(ν)? First,

γy(h) = E

∞∑

j=−∞

ψjXt−j

∞∑

k=−∞

ψkXt+h−k

=∞∑

j=−∞

ψj

∞∑

k=−∞

ψkE [Xt+h−kXt−j ]

=

∞∑

j=−∞

ψj

∞∑

k=−∞

ψkγx(h+ j − k) =

∞∑

j=−∞

ψj

∞∑

l=−∞

ψh+j−lγx(l).

It is easy to check that∑∞

j=−∞|ψj | <∞ and

∑∞

h=−∞|γx(h)| <∞ imply

that∑∞

h=−∞|γy(h)| <∞. Thus, the spectral density ofy is defined.

23


fy(ν) =

∞∑

h=−∞

γ(h)e−2πiνh

=∞∑

h=−∞

∞∑

j=−∞

ψj

∞∑

l=−∞

ψh+j−lγx(l)e−2πiνh

=∞∑

j=−∞

ψje2πiνj

∞∑

l=−∞

γx(l)e−2πiνl

∞∑

h=−∞

ψh+j−le−2πiν(h+j−l)

= ψ(e2πiνj)fx(ν)

∞∑

h=−∞

ψhe−2πiνh

=∣

∣ψ(e2πiνj)∣

∣

2fx(ν).

24

Frequency response: Examples

For a linear processYt = ψ(B)Wt, fy(ν) =∣

∣ψ(

e2πiν)∣

∣

2σ2w.

For an ARMA model,ψ(B) = θ(B)/φ(B), so{Yt} has the rational

spectrum

fy(ν) = σ2w

∣

∣

∣

∣

θ(e−2πiν)

φ (e−2πiν)

∣

∣

∣

∣

2

= σ2w

θ2q∏q

j=1

∣


∣

2

φ2p∏p

j=1 |e−2πiν − pj |

2 ,

wherepj andzj are the poles and zeros of the rational function

z 7→ θ(z)/φ(z).

25


Consider the moving average

Yt =1

2k + 1

k∑

j=−k

Xt−j .

This is a time invariant linear filter (but it is not causal). Its transfer function

is the Dirichlet kernel

ψ(e−2πiν) = Dk(2πν) =1

2k + 1

k∑

j=−k

e−2πijν

=

1 if ν = 0,sin(2π(k+1/2)ν)(2k+1) sin(πν) otherwise.

26

Example: Moving average

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ν

Transfer function of moving average (k=5)

27


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ν

Squared modulus of transfer function of moving average (k=5)

This is alow-pass filter: It preserves low frequencies and diminishes high

frequencies. It is often used to estimate a monotonic trend component of a

series.

28

Example: Differencing

Consider the first difference

Yt = (1−B)Xt.

This is a time invariant, causal, linear filter.

Its transfer function is

ψ(e−2πiν) = 1− e−2πiν ,

so |ψ(e−2πiν)|2 = 2(1− cos(2πν)).

29


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.5

1

1.5

2

2.5

3

3.5

4

ν

Transfer function of first difference

This is ahigh-pass filter: It preserves high frequencies and diminishes low

frequencies. It is often used to eliminate a trend componentof a series.

30


1. Review: Spectral density, rational spectra, linear filters.

2. Frequency response of linear filters.

3. Spectral estimation

4. Sample autocovariance

5. Discrete Fourier transform and the periodogram

1



h=−∞ |γ(h)| <∞, then we define itsspectral densityas

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh

for −∞ < ν <∞. We have

γ(h) =

∫ 1/2

−1/2

e2πiνhf(ν) dν.

2

Review: Rational spectra

For a linear time series withMA(∞) polynomialψ,

f(ν) = σ2w

∣

∣ψ(

e2πiν)∣

∣

2.

If it is an ARMA(p,q), we have

f(ν) = σ2w

∣

∣

∣

∣

θ(e−2πiν)

φ (e−2πiν)

∣

∣

∣

∣

2

= σ2w

θ2q∏q

j=1

∣


∣

2

φ2p∏p

j=1 |e−2πiν − pj |2,

wherez1, . . . , zq are the zeros (roots ofθ(z))

andp1, . . . , pp are the poles (roots ofφ(z)).

3

Review: Time-invariant linear filters

A filter is an operator; given a time series{Xt}, it maps to a time series{Yt}. A linear filter satisfies

Yt =∞∑

j=−∞

at,jXj .

time-invariant: at,t−j = ψj :

Yt =

∞∑

j=−∞

ψjXt−j .

causal: j < 0 impliesψj = 0.

Yt =∞∑

j=0

ψjXt−j.

4


The operation∞∑

j=−∞

ψjXt−j

is called theconvolutionof X with ψ.

5


The sequenceψ is also called theimpulse response, since the output{Yt} of

the linear filter in response to aunit impulse,

Xt =

1 if t = 0,

0 otherwise,

is

Yt = ψ(B)Xt =∞∑

j=−∞

ψjXt−j = ψt.

6







7


Suppose that{Xt} has spectral densityfx(ν) andψ is stable, that is,∑∞

j=−∞ |ψj | <∞. ThenYt = ψ(B)Xt has spectral density

fy(ν) =∣

∣ψ(

e2πiν)∣

∣

2fx(ν).

The functionν 7→ ψ(e2πiν) (the polynomialψ(z) evaluated on the unit

circle) is known as thefrequency responseor transfer functionof the linear

filter.

The squared modulus,ν 7→ |ψ(e2πiν)|2 is known as thepower transfer

functionof the filter.

8


For stableψ, Yt = ψ(B)Xt has spectral density

fy(ν) =∣

∣ψ(

e2πiν)∣

∣

2fx(ν).

We have seen that a linear process,Yt = ψ(B)Wt, is a special case, since

fy(ν) = |ψ(e2πiν)|2σ2w = |ψ(e2πiν)|2fw(ν).

When we pass a time series{Xt} through a linear filter, the spectral density

is multiplied, frequency-by-frequency, by the squared modulus of the

frequency responseν 7→ |ψ(e2πiν)|2.

This is a version of the equality Var(aX) = a2Var(X), but the equality is

true for the component of the variance at every frequency.

This is also the origin of the name ‘filter.’

9


Why isfy(ν) =∣

∣ψ(

e2πiν)∣

∣

2fx(ν)? First,

γy(h) = E

∞∑

j=−∞

ψjXt−j

∞∑

k=−∞

ψkXt+h−k

=∞∑

j=−∞

ψj

∞∑

k=−∞

ψkE [Xt+h−kXt−j ]

=

∞∑

j=−∞

ψj

∞∑

k=−∞

ψkγx(h+ j − k) =

∞∑

j=−∞

ψj

∞∑

l=−∞

ψh+j−lγx(l).

It is easy to check that∑∞

j=−∞ |ψj | <∞ and∑∞

h=−∞ |γx(h)| <∞ imply

that∑∞

h=−∞ |γy(h)| <∞. Thus, the spectral density ofy is defined.

10


fy(ν) =

∞∑

h=−∞

γ(h)e−2πiνh

=∞∑

h=−∞

∞∑

j=−∞

ψj

∞∑

l=−∞

ψh+j−lγx(l)e−2πiνh

=∞∑

j=−∞

ψje2πiνj

∞∑

l=−∞

γx(l)e−2πiνl

∞∑

h=−∞

ψh+j−le−2πiν(h+j−l)

= ψ(e2πiνj)fx(ν)

∞∑

h=−∞

ψhe−2πiνh

=∣

∣ψ(e2πiνj)∣

∣

2fx(ν).

11


For a linear processYt = ψ(B)Wt, fy(ν) =∣

∣ψ(

e2πiν)∣

∣

2σ2w.

For an ARMA model,ψ(B) = θ(B)/φ(B), so{Yt} has the rational

spectrum

fy(ν) = σ2w

∣

∣

∣

∣

θ(e−2πiν)

φ (e−2πiν)

∣

∣

∣

∣

2

= σ2w

θ2q∏q

j=1

∣


∣

2

φ2p∏p

j=1 |e−2πiν − pj |2,

wherepj andzj are the poles and zeros of the rational function

z 7→ θ(z)/φ(z).

12


Consider the moving average

Yt =1

2k + 1

k∑

j=−k

Xt−j .

This is a time invariant linear filter (but it is not causal). Its transfer function

is the Dirichlet kernel

ψ(e−2πiν) = Dk(2πν) =1

2k + 1

k∑

j=−k

e−2πijν

=

1 if ν = 0,sin(2π(k+1/2)ν)(2k+1) sin(πν) otherwise.

13


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

ν

Transfer function of moving average (k=5)

14


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ν

Squared modulus of transfer function of moving average (k=5)

This is alow-pass filter: It preserves low frequencies and diminishes high

frequencies. It is often used to estimate a monotonic trend component of a

series.

15


Consider the first difference

Yt = (1−B)Xt.

This is a time invariant, causal, linear filter.

Its transfer function is

ψ(e−2πiν) = 1− e−2πiν ,

so |ψ(e−2πiν)|2 = 2(1− cos(2πν)).

16


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

0.5

1

1.5

2

2.5

3

3.5

4

ν

Transfer function of first difference

This is ahigh-pass filter: It preserves high frequencies and diminishes low

frequencies. It is often used to eliminate a trend componentof a series.

17







18

Estimating the Spectrum: Outline

• We have seen that the spectral density gives an alternative view of

stationary time series.

• Given a realizationx1, . . . , xn of a time series, how can we estimate

the spectral density?

• One approach: replaceγ(·) in the definition

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh,

with the sample autocovarianceγ̂(·).

• Another approach, called theperiodogram: computeI(ν), the squared

modulus of the discrete Fourier transform (at frequenciesν = k/n).

19

Estimating the spectrum: Outline

• These two approaches areidenticalat the Fourier frequenciesν = k/n.

• The asymptotic expectation of the periodogramI(ν) is f(ν). We can

derive some asymptotic properties, and hence do hypothesistesting.

• Unfortunately, the asymptotic variance ofI(ν) is constant.

It is not a consistent estimator off(ν).

• We can reduce the variance by smoothing the periodogram—averaging

over adjacent frequencies. If we average over a narrower range as

n→ ∞, we can obtain a consistent estimator of the spectral density.

20

Estimating the spectrum: Sample autocovariance

Idea: use the sample autocovarianceγ̂(·), defined by

γ̂(h) =1

n

n−|h|∑

t=1

(xt+|h| − x̄)(xt − x̄), for −n < h < n,

as an estimate of the autocovarianceγ(·), and then use a sample version of

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh,

That is, for−1/2 ≤ ν ≤ 1/2, estimatef(ν) with

f̂(ν) =

n−1∑

h=−n+1

γ̂(h)e−2πiνh.

21

Estimating the spectrum: Periodogram

Another approach to estimating the spectrum is called the periodogram. It

was proposed in 1897 by Arthur Schuster (at Owens College, which later

became part of the University of Manchester), who used it to investigate

periodicity in the occurrence of earthquakes, and in sunspot activity.

Arthur Schuster, “On Lunar and Solar Periodicities of Earthquakes,”Proceedings of

the Royal Society of London, Vol. 61 (1897), pp. 455–465.

To define the periodogram, we need to introduce thediscrete Fourier

transformof a finite sequencex1, . . . , xn.

22







23

Discrete Fourier transform

For a sequence(x1, . . . , xn), define thediscrete Fourier transform (DFT)as

(X(ν0), X(ν1), . . . , X(νn−1)), where

X(νk) =1√n

n∑

t=1

xte−2πiνkt,

andνk = k/n (for k = 0, 1, . . . , n− 1) are called theFourier frequencies.

(Think of {νk : k = 0, . . . , n− 1} as the discrete version of the frequency

rangeν ∈ [0, 1].)

First, let’s show that we can view the DFT as a representationof x in a

different basis, theFourier basis.

24


Consider the spaceCn of vectors ofn complex numbers, with inner product〈a, b〉 = a∗b, wherea∗ is the complex conjugate transpose of the vectora ∈ Cn.

Suppose that a set{φj : j = 0, 1, . . . , n− 1} of n vectors inCn areorthonormal:

〈φj , φk〉 =

1 if j = k,

0 otherwise.

Then these{φj} span the vector spaceCn, and so for any vectorx, we canwrite x in terms of this new orthonormal basis,

x =

n−1∑

j=0

〈φj , x〉φj . (picture)

25


Consider the following set ofn vectors inCn:{

ej =1√n

(

e2πiνj , e2πi2νj , . . . , e2πinνj)′

: j = 0, . . . , n− 1

}

.

It is easy to check that these vectors are orthonormal:

〈ej , ek〉 =1

n

n∑

t=1

e2πit(νk−νj) =1

n

n∑

t=1

(

e2πi(k−j)/n)t

=

1 if j = k,1ne

2πi(k−j)/n 1−(e2πi(k−j)/n)n

1−e2πi(k−j)/n otherwise

=

1 if j = k,

0 otherwise,

26


where we have used the fact thatSn =∑n

t=1 αt satisfies

αSn = Sn + αn+1 − α and soSn = α(1− αn)/(1− α) for α 6= 1.

So we can represent the real vectorx = (x1, . . . , xn)′ ∈ Cn in terms of this

orthonormal basis,

x =n−1∑

j=0

〈ej , x〉ej =n−1∑

j=0

X(νj)ej .

That is, the vector of discrete Fourier transform coefficients

(X(ν0), . . . , X(νn−1)) is the representation ofx in the Fourier basis.

27


An alternative way to represent the DFT is by separately considering the

real and imaginary parts,

X(νj) = 〈ej , x〉 =1√n

n∑

t=1

e−2πitνjxt

=1√n

n∑

t=1

cos(2πtνj)xt − i1√n

n∑

t=1

sin(2πtνj)xt

= Xc(νj)− iXs(νj),

where this defines the sine and cosine transforms,Xs andXc, of x.

28


1. Review: Spectral density estimation, sample autocovariance.

2. The periodogram and sample autocovariance.

3. Asymptotics of the periodogram.

1

Estimating the Spectrum: Outline

• We have seen that the spectral density gives an alternative view of

stationary time series.

• Given a realizationx1, . . . , xn of a time series, how can we estimate

the spectral density?

• One approach: replaceγ(·) in the definition

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh,

with the sample autocovarianceγ̂(·).

• Another approach, called theperiodogram: computeI(ν), the squared

modulus of the discrete Fourier transform (at frequenciesν = k/n).

2

Estimating the spectrum: Outline

• These two approaches areidentical at the Fourier frequenciesν = k/n.

• The asymptotic expectation of the periodogramI(ν) is f(ν). We can

derive some asymptotic properties, and hence do hypothesistesting.

• Unfortunately, the asymptotic variance ofI(ν) is constant.

It is not a consistent estimator off(ν).

3

Review: Spectral density estimation


h=−∞ |γ(h)| < ∞, then we define itsspectral density as

f(ν) =∞∑

h=−∞

γ(h)e−2πiνh

for −∞ < ν < ∞.

4

Review: Sample autocovariance

Idea: use the sample autocovarianceγ̂(·), defined by

γ̂(h) =1

n

n−|h|∑

t=1

(xt+|h| − x̄)(xt − x̄), for −n < h < n,

as an estimate of the autocovarianceγ(·), and then use

f̂(ν) =

n−1∑

h=−n+1

γ̂(h)e−2πiνh

for −1/2 ≤ ν ≤ 1/2.

5


For a sequence(x1, . . . , xn), define thediscrete Fourier transform (DFT) as

(X(ν0), X(ν1), . . . , X(νn−1)), where

X(νk) =1√n

n∑

t=1

xte−2πiνkt,

andνk = k/n (for k = 0, 1, . . . , n− 1) are called theFourier frequencies.

(Think of {νk : k = 0, . . . , n− 1} as the discrete version of the frequency

rangeν ∈ [0, 1].)

First, let’s show that we can view the DFT as a representationof x in a

different basis, theFourier basis.

6


Consider the spaceCn of vectors ofn complex numbers, with inner product〈a, b〉 = a∗b, wherea∗ is the complex conjugate transpose of the vectora ∈ Cn.

Suppose that a set{φj : j = 0, 1, . . . , n− 1} of n vectors inCn areorthonormal:

〈φj , φk〉 =

1 if j = k,

0 otherwise.

Then these{φj} span the vector spaceCn, and so for any vectorx, we canwrite x in terms of this new orthonormal basis,

x =

n−1∑

j=0

〈φj , x〉φj . (picture)

7


Consider the following set ofn vectors inCn:{

ej =1√n

(

e2πiνj , e2πi2νj , . . . , e2πinνj)′

: j = 0, . . . , n− 1

}

.

It is easy to check that these vectors are orthonormal:

〈ej , ek〉 =1

n

n∑

t=1

e2πit(νk−νj) =1

n

n∑

t=1

(

e2πi(k−j)/n)t

=

1 if j = k,1ne

2πi(k−j)/n 1−(e2πi(k−j)/n)n

1−e2πi(k−j)/n otherwise

=

1 if j = k,

0 otherwise,

8


where we have used the fact thatSn =∑n

t=1 αt satisfies

αSn = Sn + αn+1 − α and soSn = α(1− αn)/(1− α) for α 6= 1.

So we can represent the real vectorx = (x1, . . . , xn)′ ∈ Cn in terms of this

orthonormal basis,

x =n−1∑

j=0

〈ej , x〉ej =n−1∑

j=0

X(νj)ej .

That is, the vector of discrete Fourier transform coefficients

(X(ν0), . . . , X(νn−1)) is the representation ofx in the Fourier basis.

9


An alternative way to represent the DFT is by separately considering the

real and imaginary parts,

X(νj) = 〈ej , x〉 =1√n

n∑

t=1

e−2πitνjxt

=1√n

n∑

t=1

cos(2πtνj)xt − i1√n

n∑

t=1

sin(2πtνj)xt

= Xc(νj)− iXs(νj),

where this defines the sine and cosine transforms,Xs andXc, of x.

10

Periodogram

The periodogram is defined as

I(νj) = |X(νj)|2

=1

n

∣

∣

∣

∣

∣

n∑

t=1

e−2πitνjxt

∣

∣

∣

∣

∣

2

= X2c (νj) +X2

s (νj).

Xc(νj) =1√n

n∑

t=1

cos(2πtνj)xt,

Xs(νj) =1√n

n∑

t=1

sin(2πtνj)xt.

11

Periodogram

SinceI(νj) = |X(νj)|2 for one of the Fourier frequenciesνj = j/n (for

j = 0, 1, . . . , n− 1), the orthonormality of theej implies that we can write

x∗x =

n−1∑

j=0

X(νj)ej

∗

n−1∑

j=0

X(νj)ej

=n−1∑

j=0

|X(νj)|2 =n−1∑

j=0

I(νj).

For x̄ = 0, we can write this as

σ̂2x =

1

n

n∑

t=1

x2t =

1

n

n−1∑

j=0

I(νj).

12

Periodogram

This is the discrete analog of the identity

σ2x = γx(0) =

∫ 1/2

−1/2

fx(ν) dν.

(Think of I(νj) as the discrete version off(ν) at the frequencyνj = j/n,

and think of(1/n)∑

νj· as the discrete version of

∫

ν·dν.)

13


Why is the periodogram at a Fourier frequency (that is,ν = νj) the same as

computingf(ν) from the sample autocovariance?

Almost the same—they are not the same atν0 = 0 whenx̄ 6= 0.

But if eitherx̄ = 0, or we consider a Fourier frequencyνj with

j ∈ {1, . . . , n− 1}, . . .

14


I(νj) =1

n

∣

∣

∣

∣

∣

n∑

t=1

e−2πitνjxt

∣

∣

∣

∣

∣

2

=1

n

∣

∣

∣

∣

∣

n∑

t=1

e−2πitνj (xt − x̄)

∣

∣

∣

∣

∣

2

=1

n

(

n∑

t=1

e−2πitνj (xt − x̄)

)(

n∑

t=1

e2πitνj (xt − x̄)

)

=1

n

∑

s,t

e−2πi(s−t)νj (xs − x̄)(xt − x̄) =n−1∑

h=−n+1

γ̂(h)e−2πihνj ,

where the fact thatνj 6= 0 implies∑n

t=1 e−2πitνj = 0 (we showed this

when we were verifying the orthonormality of the Fourier basis) has

allowed us to subtract the sample mean in that case.

15

Asymptotic properties of the periodogram

We want to understand the asymptotic behavior of the periodogramI(ν) at

a particular frequencyν, asn increases. We’ll see that its expectation

converges tof(ν).

We’ll start with a simple example: Suppose thatX1, . . . , Xn are

i.i.d. N(0, σ2) (Gaussian white noise). From the definitions,

Xc(νj) =1√n

n∑

t=1

cos(2πtνj)xt, Xs(νj) =1√n

n∑

t=1

sin(2πtνj)xt,

we have thatXc(νj) andXs(νj) are normal, with

EXc(νj) = EXs(νj) = 0.

16


Also,

Var(Xc(νj)) =σ2

n

n∑

t=1

cos2(2πtνj)

=σ2

2n

n∑

t=1

(cos(4πtνj) + 1) =σ2

2.

Similarly, Var(Xs(νj)) = σ2/2.

17


Also,

Cov(Xc(νj), Xs(νj)) =σ2

n

n∑

t=1

cos(2πtνj) sin(2πtνj)

=σ2

2n

n∑

t=1

sin(4πtνj) = 0,

Cov(Xc(νj), Xc(νk)) = 0

Cov(Xs(νj), Xs(νk)) = 0

Cov(Xc(νj), Xs(νk)) = 0.

for anyj 6= k.

18


That is, ifX1, . . . , Xn are i.i.d.N(0, σ2)

(Gaussian white noise;f(ν) = σ2), then theXc(νj) andXs(νj) are all

i.i.d. N(0, σ2/2). Thus,

2

σ2I(νj) =

2

σ2

(

X2c (νj) +X2

s (νj))

∼ χ22.

So for the case of Gaussian white noise, the periodogram has achi-squared

distribution that depends on the varianceσ2 (which, in this case, is the

spectral density).

19


Under more general conditions (e.g., normal{Xt}, or linear process{Xt}with rapidly decaying ACF), theXc(νj), Xs(νj) are all asymptotically

independent andN(0, f(νj)/2).

Consider a frequencyν. For a given value ofn, let ν̂(n) be the closest

Fourier frequency (that is,̂ν(n) = j/n for a value ofj that minimizes

|ν − j/n|). Asn increases,̂ν(n) → ν, and (under the same conditions that

ensure the asymptotic normality and independence of the sine/cosine

transforms),f(ν̂(n)) → f(ν). (picture)

In that case, we have

2

f(ν)I(ν̂(n)) =

2

f(ν)

(

X2c (ν̂

(n)) +X2s (ν̂

(n)))

d→ χ22.

20


Thus,

EI(ν̂(n)) =f(ν)

2E

(

2

f(ν)

(

X2c (ν̂

(n)) +X2s (ν̂

(n)))

)

→ f(ν)

2E(Z2

1 + Z22 ) = f(ν),

whereZ1, Z2 are independentN(0, 1). Thus, the periodogram is

asymptotically unbiased.

21

Documents

Introduction to Time Series Analysis. Lecture 15. · Introduction to Time Series Analysis. Lecture 16. 1. Review: Spectral density 2. Examples 3. Spectral distribution function. 4