A Short Course of Time-Series Analysis and Forecasting by D S G Pollock

A SHORT COURSE OFTIME-SERIES ANALYSIS AND FORECASTING

At The Institute of Advanced Studies, Viennafrom March 22nd to April 2, 1993

Lecturer : D.S.G. Pollock

Queen Mary and Westfield College,The University of London

This course is concerned with the methods of time-series modelling which areapplicable in econometrics and throughout a wide range of disciplines in thephysical and social sciences. The course is for nonspecialists who may be inter-ested in pursuing this topic as an adjunct to their other studies and who mightenvisage employing the techniques of time-series analysis in empirical enquirieswithin the context of their own disciplines.

The course is mathematically self-contained in the sense that the requisiteresults are presented either in the lectures themselves or in the accompanyingtext. The techniques of the frequency domain and the time domain are givenan equal emphasis in this course.

Week 1

1 Trends in Time Series

2 Cycles in Time Series

3 Models and Methods of Time-Series Analysis

4 Time-Series Analysis in the Frequency Domain

5 Linear Stochastic Models

Week 2

6 State-Space Analysis and Structural Time-Series Models

7 Forecasting with ARIMA Models

8 Identification and Estimation of ARIMA Models

9 Identification and Estimation in the Frequency Domain

10 Seasonality and Linear Filtering

In addition, there will be a public Lecture on the topic of The Methods ofTime-Series Analysis which is to take place on ***** in ***** at *****. Thislecture will give a broad overview of the mathematical themes of time-seriesanalysis and of the historical development of the subject; and it is intended foran audience with no significant knowledge of the subject.

LECTURES IN TIME-SERIES ANALYSISAND FORECASTING

by

D.S.G. Pollock


These two booklets contain some of the material of the coursestitled Methods of Time-Series Analysis and Economic Forecastingwhich have been taught in the Department of Economics of QueenMary College in recent years. The material is presented in the form ofa series of ten lectures for a course given at the Institute for AdvancedStudies in Vienna titled A Short Course in Time-Series Analysis.

Book 1

1 Trends in Economic Time Series

2 Seasons and Cycles in Time Series

3 Models and Methods of Time-Series Analysis

4 Time-Series Analysis in the Frequency Domain

5 Linear Stochastic Models

Book 2

6 State-Space Analysis and Structural Time-Series Models

7 Forecasting with ARIMA Models

8 Identification and Estimation of ARIMA Models

9 Identification and Estimation in the Frequency Domain

10 Seasonality and Linear Filtering

THE METHODS OF TIME-SERIES ANALYSIS

by

D.S.G. Pollock


This paper describes some of the principal themes of time-series analysis

and it gives an historical account of their development.

There are two distinct yet broadly equivalent modes of time-series anal-

ysis which may be pursued. On the one hand there are the time-domain

methods which have their origin in the classical theory of correlation; and

they lead inevitably towards the construction of structural or parametric

models of the autoregressive moving-average type. On the other hand are

the frequency-domain methods of spectral analysis which are based on an

extension of the methods of Fourier analysis.

The paper describes the developments which led to the synthesis of

the two branches of time-series analysis and it indicates how this synthesis

was achieved.

It remains true that the majority of time-series analysts operate prin-

cipally in one or other of the two domains. Such specialisation is often

influenced by the academic discipline to which the analyst adheres. How-

ever, it is clear that there are many advantages to be derived from pursuing

the two modes of analysis concurrently.

Address for correspondence:

D.S.G. PollockDepartment of EconomicsQueen Mary CollegeUniversity of LondonMile End RoadLondon E1 4 NS

Tel : +44-71-975-5096Fax : +44-71-975-5500

LECTURE 1

Trends in EconomicTime Series

In many time series, broad movements can be discerned which evolve moregradually than the other motions which are evident. These gradual changes aredescribed as trends and cycles. The changes which are of a transitory natureare described as fluctuations.

In some cases, the trend should be regarded as nothing more than theaccumulated effect of the fluctuations. In other cases, we feel that the trendsand the fluctuations represent different sorts of influences, and we are inclinedto decompose the time series into the corresponding components.

In economics, it is traditional to decompose time series into a variety ofcomponents, some or all of which may be present in a particular instance. If{Yt} is the sequence of values of an economic index, then its generic element isliable to be expressed as

(1.1) Yt = Tt + Ct + St + εt,

whereTt is the global trend,

Ct is a secular cycle,

St is the seasonal variation and

εt is an irregular component.

Many of the more prominent macroeconomic indicators are amenable toa decomposition of the sort depicted above. One can imagine, for example, aquarterly index of Gross National Product which appears to be following anexponential growth trend {Tt}.

The growth trend might be obscured, to some extent, by a superimposedcycle {Ct} with a period of roughly four and a half years, which happens tocorrespond, more or less, to the average lifetime of the legislative assembly.The reasons for this curious coincidence need not concern us here.

The ghost of an annual cycle {St} might also be apparent in the index;and this could well be a reflection of the fact that some economic activities,

1

D.S.G. POLLOCK : TIME SERIES AND FORECASTING

such as building construction, are significantly affected by the weather and bythe duration of sunlight.

When the foregoing components—the trend, the secular cycle and the sea-sonal cycle—have been extracted from the index, the residue should correspondto an irregular component {εt} for which no unique explanation can be offered.This component ought to resemble a time series generated by a so-called sta-tionary stochastic process. Such a series has the characteristic that any segmentof consecutive elements looks much like any other segment of the same duration,regardless of the date at which it begins or ends.

If the residue follows a trend, or if it manifests a more or less regularpattern, then it contains features which ought to have been attributed to theother components; and we should set about the task of redefining them.

There are two distinct purposes for which we might wish to effect sucha decomposition. The first purpose is to give a summary description of thesalient features of the time series. Thus, if we eliminate the irregular andseasonal components from the series, we are left with an index which may givea clearer picture of the more important features. This might help us to gainan insight into the fundamental workings of the economic or social structurewhich has generated the series.

The other purpose in decomposing the series is to predict its future values.For each component of the time series, a particular method of prediction is ap-propriate. By combining the separate predictions of the components, a forecastcan be derived which may be superior to one derived by a method which paysno attention to the underlying structure of the time series.

Extracting the Trend

There are essentially two ways of extracting trends from a time series. Thefirst way is to apply to the series a variety of so-called filters which annihilateor nullify all of the components which are not regarded as trends.

A filter is a carefully crafted moving average which spans a number of datapoints and which attributes a weight to each of them. The weights should sumto unity to ensure that the filter does not systematically inflate or deflate thevalues of the series. Thus, for example, the following moving average mightserve to eliminate the annual cycle from an economic series which is recordedat quarterly intervals:

(1.2) Yt =1

16

{Yt+3 + 2Yt+2 + 3Yt+1 + 4Yt + 3Yt−1 + 2Yt−2 + Yt−3

}.

Another filter with a wider span and a different profile of weights might serveto eliminate the four-and-a-half-year cycle which is present in our imaginaryseries of Gross National Product.

2

D.S.G. POLLOCK : TRENDS IN TIME SERIES

Finally a filter could be designed which smooths away the irregularitiesof the index which defy systematic explanation. The order in which the threefilters are applied is immaterial; and what is left after they have been appliedshould give a picture of the underlying trend {Tt} of the index.

Other collections of filters, applied in series, might serve to isolate theother components {Ct} and {St} which are to be found in equation (1).

The process of filtering is often a good way of deriving an index which rep-resents the more important historical characteristics of the time series. How-ever, it generates no model for the underlying trends; and it suggests no wayof predicting their future values.

The alternative way of extracting the trend from the index is to fit somefunction which is capable of adapting itself to whatever form the trend happensto display. Different functions are appropriate to different forms of trend; andsome functions which analysts tend to favour see almost always to be inappro-priate. Once an analytic function has been fitted to the series, it may be usedto provide extrapolative forecasts of the trend.

Polynomial Trends

Amongst the mathematical functions which suggest themselves as meansof modelling a trend is a pth-degree polynomial whose argument is the timeindex t:

(1.3) φ(t) = φ0 + φ1t+ · · ·+ φptp.

When there is no theory to specify a mathematical form for the trend, itmay be possible to approximate it by a polynomial of low degree. This notionis suggested by the formal result that every analytic mathematical function canbe expanded as a power series, which is an indefinite sum whose terms containrising powers of the argument. Thus the polynomial in t may be construed asan approximation to an analytic function which is obtained by discarding allbut the leading terms of a power-series expansion.

There are also arguments from physics which suggest that first-degree andsecond-degree polynomials in t, which are linear and quadratic time trends inother words, are common in the natural world. The thought occurs to us thatsuch trends might also arise in the social world.

According to a well-known dictum,

Every body continues in its state of rest or of uniform motion in a straightline unless it is compelled to change that state by forces impressed upon it.

This is Newtons’s first law of motion. The kinematic equation for the distancecovered by a body moving with constant velocity in a straight line is

(1.4) x = x0 + ut,

3


where u is the uniform velocity, and x0 represents the initial position of thebody at time t = 0. This is nothing but a first-degree polynomial in t.

Newton’s second law of motion asserts that

The change of motion is proportional to the motive force impressed; and ismade in the direction of the straight line in which the force is impressed.

In modern language, this is expressed by saying that the acceleration of abody along a straight line is proportional to the force which is applied in thatdirection. The kinematic equation for the distance travelled under uniformlyaccelerated rectilinear motion is

(1.5) x = x0 + u0t+1

2at2,

where u0 is the velocity at time t = 0 and a is the constant acceleration due tothe motive force. This is just a quadratic in t.

A linear or a quadratic function may be appropriate if the trend in questionis monotonically increasing or decreasing. In other cases, polynomials of higherdegrees might be fitted. Figure 1 is the result of fitting a cubic function to aneconomic time series by least-squares regression.

1920 1925 1930 1935 1940 140

150

160

170

180

Figure 1. A cubic function fitted to data on meat

consumption in the United States, 1919–1941.

4


It might be felt that there are salient features in the data which are notcaptured by the cubic polynomial. In that case, the recourse might be toincrease the degree of the polynomial by one. The result will be a curve whichfits the data more closely. Also, it will be found that one of the branchesof the polynomial—the left branch in this case—has changed direction. Thevalues found by extrapolating the quartic function backwards in time will differradically from those found by extrapolating the cubic function.

In general, the effect of altering the degree of the polynomial by one willbe to alter the direction of one or other of the branches of the fitted function;and, from the point of view of forecasting, this is a highly unsatisfactory cir-cumstance. Another feature of a polynomial function is that its branches tendto plus or minus infinity with increasing rapidity as the argument increases ordecreases beyond a range of central values where the function has its stationarypoints and its points of inflection. This might also be regarded as a undesirableproperty for a function which is to be used in extrapolative forecasting.

Some care has to be taken in fitting a polynomial time trend by the methodof least-squares regression. A straightforward procedure, which comes imme-diately to mind, is to form a matrix X of regressors in which the generic row[t0, t, t2, . . . , tp] contains rising powers of the argument t. The annual data onmeat consumption, for example, which are plotted in Figure 1, run from 1919to 1941; and these dates might be taken as the initial and terminal values oft. In that case, there would be a vast differences in the values of the elementsof the matrix X. For, whereas t0 = 1 for all values of t = 1919, . . . , 1941, weshould find that, when t = 1941, the value of t3 is in excess of 7, 300 million.Clearly, such a disparity of numbers taxes the precision of the computer.

An obvious recourse is to recode the values of t. Thus, we might taket = −11, . . . , 11 for the range of the argument. The change would affect only thevalue of the intercept term φ0 which could be adjusted ex post. Unfortunately,such a recourse in not always adequate to ensure the numerical accuracy ofthe computation. The reason lies in the peculiarly ill-conditioned nature of thematrix (X ′X)−1 of cross products.

In fact, a specialised procedure of polynomial regression is often called forin which the functions t0, t, . . . , tp are replaced by a set of so-called orthogo-nal polynomials which give rise to vectors of regressors whose cross productsare zero-valued. The estimated coefficients associated with these orthogonalpolynomials can be converted into the coefficients φ0, φ1, . . . , φp of equation(3).

Exponential and Logistic Trends

The notion of exponential or geometric growth is common in economicswhere it is closely related to the idea of compound interest. Consider a financialasset with an annual rate of return of γ. The annual growth factor for an

5


investment of unit value is (1 + γ). If α units were invested at time t = 0, andif the returns were compounded with the principal on an annual basis, then thevalue of the investment at time t would be given by

(1.6) yt = α(1 + γ)t.

An investment which is compounded twice a year has an annual growthfactor of (1+ 1

2γ)2, and one which is compounded quarterly has a growth factorof (1 + 1

4γ)4. If an investment were compounded continuously, then its growthfactor would be lim(n → ∞)(1 + 1

nγ)n = eγ . The value of the asset at time twould be given by

(1.7) y = αeγt;

and this is the equation for exponential growth.The equation of exponential growth is a solution of the differential equation

(1.8)dy

dt= γy.

The implication of the differential equation is that the absolute rate of growthin y is proportional to the value already attained by y. It is equivalent to saythat the proportional rate of growth (1/y)(dy/dt) is constant.

An exponential growth trend can be fitted to observations y1, . . . , yn, sam-pled at regular intervals, by applying ordinary least-squares regression to theequation

(1.9) ln yt = lnα+ γt+ εt.

This is obtained by taking the logarithm of equation (7) and adding a distur-bance term εt. An alternative parametrisation is obtained by setting λ = eγ .Then the transformed growth equation becomes

(1.10) ln yt = lnα+ (lnλ)t+ εt,

and the geometric growth rate is λ− 1.Whereas unhindered exponential growth might well be a possibility for

certain monetary or financial quantities, it is implausible to suggest that sucha process can be sustained for long when real resources are involved. Since realresources are finite, we expect there to be upper limits to the levels which canbe attained by economic variables.

For an example of a trend with an upper bound, we might imagine a pro-cess whereby the ownership of a consumer durable grows until the majority

6


of households or individuals are in possession of it. Good examples are pro-vided by the sales of domestic electrical appliances such are fridges and colourtelevision sets.

Typically, when the new durable is introduced, the rate of sales is slow.Then, as information about the durable, or experience of it, is spread amongstconsumers, the sales begin to accelerate. For a time, their cumulated totalmight appear to follow an exponential growth path. Then come the first signsthat the market is being saturated; and there is a point of inflection in thecumulative curve where its second derivative—which is the rate of increase insales per period—passes from positive to negative. Eventually, as the level ofownership approaches the saturation point, the rate of sales will decline to aconstant level, which may be at zero, if the good is wholly durable, or at asmall positive replacement rate if it is not.

It is very difficult to specify the dynamics of a process such as the one wehave described whenever there are replacement sales to be taken into account.The reason is that the replacement sales depend not only on the size of theownership of the durable goods but also upon the age of the stock of goods.The latter is a function, at least in an early period, of the way in which saleshave grown at the outset. Often we have to be content with modelling only thegrowth of ownership.

One of the simplest ways of modelling the growth of ownership is to employthe so-called logistic curve. This classical device has its origins in the mathe-matics of biology where it has been used to model the growth of a populationof animals in an environment with limited food resources.

0.25

0.5

1.0

−4 −2 2 4

Figure 2. The logistic function ex/(1 + ex) and its derivative. For large

negative values of x, the function and its derivative are close. In the case

of the exponential function ex, they coincide for all values of x.

7


The simplest version of the function is given by

(1.11) π(x) =1

1 + e−x=

ex

1 + ex.

The second expression comes from multiplying top and bottom of the firstexpression by ex. The logistic curve varies between a value of zero, which isapproached as x→ −∞, and a value of unity, which is approached as x→ +∞.At the mid point, where x = 0, the value of the function is π(0) = 1

2 . Thesecharacteristics can be understood easily in reference to the first expression.

The alternative expression for the logistic curve also lends itself to aninterpretation. We may begin by noting that, for large negative values of x,the term 1+ex, which is found in the denominator, is not significantly differentfrom unity. Therefore, as x increases from such values towards zero, the logisticfunction closely resembles an exponential function. By the time x reaches zero,the denominator, with a value of 2, is already significantly affected by the termex. At that point, there is an inflection in the curve as the rate of increase in πbegins to decline. Thereafter, the rate of increase declines rapidly toward zero,with the effect that the value of π never exceeds unity.

The inverse mapping x = x(π) is easily derived. Consider

(1.12)1− π =

1 + ex

1 + ex− ex

1 + ex

=1

1 + ex=

π

ex.

This is rearranged to give

(1.13) ex =π

1− π ,

whence the inverse function is found by taking natural logarithms:

(1.14) x(π) = ln

{π

1− π

}.

The logistic curve needs to be elaborated before it can be fitted flexiblyto a set of observations y1, . . . , yn tending to an upper asymptote. The generalfrom of the function is

(1.15) y(t) =γ

1 + e−h(t)=

γeh(t)

1 + eh(t); h(t) = α+ βt.

Here γ is the upper asymptote of the function, which is the saturation level ofownership in the example of the consumer durable. The parameters β and α

8


determine respectively the rate of ascent of the function and the mid point ofits ascent, measured on the time-axis.

It can be seen that

(1.16) ln

{y(t)

γ − y(t)

}= h(t).

Therefore, with the inclusion of a residual term, the equation for the genericelement of the sample is

(1.17) ln

{yt

γ − yt

}= α+ βt+ et.

For a given value of γ, one may calculate the value of the dependent variable onthe LHS. Then the values of α and β may be found by least-squares regression.

The value of γ may also be determined according to the criterion of min-imising the sum of squares of the residuals. A crude procedure would entailrunning numerous regressions, each with a different value for γ. The defini-tive value would be the one from the regression with the least residual sum ofsquares. There are other procedures for finding the minimising value of γ ofa more systematic and efficient nature which might be used instead. Amongstthese are the methods of Golden Section Search and Fibonnaci Search whichare presented in many texts of numerical analysis.

The objection may be raised that the domain of the logistic function isthe entire real line—which spans all of time from creation to eternity—whereasthe sales history of a consumer durable dates only from the time when it isintroduced to the market. The problem might be overcome by replacing thetime variable t in equation (15) by its logarithm and by allowing t to take onlynonnegative values. Then, whilst t ∈ [0,∞), we still have ln(t) ∈ (−∞,∞),which is the entire domain of the logistic function.

1 2 3 4

0.2

0.4

0.6

0.8

1.0

Figure 3. The function y(t) = γ/(1 + exp{α − β ln(t)}) with γ = 1,

α = 4 and β = 7. The positive values of t are the domain of the function.

9


There are many curves which will serve the purpose of modelling a sig-moidal growth process. Their number is equal, at least, to the number oftheoretical probability density functions—for the corresponding (cumulative)distribution functions rise monotonically from zero to unity in ways with aresuggestive of processes of bounded growth.

In fact, we do not need to have an analytic form for a cumulative functionbefore it can be fitted to a growth process. It is enough to have a table ofvalues of a standardised form of the function. An example is provided by thenormal density function whose distribution function is regularly fitted to datapoints in the course of probit analysis. In this case, the fitting involves findingvalues for the location parameter µ and the dispersion parameter σ2 by whichthe standard normal function is converted into an arbitrary normal function.Nowadays, there are efficient procedures for numerical optimisation which canaccomplish such tasks with ease.

Flexible Trends

If the purpose of decomposing a time series is to form predictions of itscomponents, then it is important to obtain adequate representations of thesecomponents at every point within the sample period. The device which is mostappropriate to the extrapolative forecasting of a trend is rarely the best meansof representing it within the sample. An extrapolation is usually based upona simple analytic function; and any attempt to make the function reflect thelocal variations of the sample will endow it with global characteristics whichmay affect the forecasts adversely.

One way of modelling the local characteristics of a trend without prejudic-ing its global characteristics is to use a segmented curve. In many applications,it has been found that a curve with cubic polynomial segments is appropriate.The segments must be joined in a way which avoids evident discontinuities. Inpractice, the requirement is usually for continuous first-order and second-orderderivatives. A curve whose segments are joined in this way is described as acubic spline.

A spline is a draughtsman’s tool which was once used in drawing smoothcurves. It is a thin flexible piece of wood which was clamped to a series ofpins which were placed along the path of the curve which had to be described.Some of the essential properties of a mathematical spline can be understoodby bearing the real spline in mind. The pins to which a draughtsman clampedhis spline correspond to the data points through which we might interpolate amathematical spline. The segments of the mathematical spline would be joinedat the data points.

The cubic spline becomes a device for modelling a trend when, instead ofpassing through the data points, it is allowed, in the interests of smoothness,to deviate from them. The Reinsch smoothing spline is fitted by minimising

10


1920 1925 1930 1935 1940 140

150

160

170

180

λ = 0.75

1920 1925 1930 1935 1940 140

150

160

170

180

λ = 0.125

Figure 4. Cubic smoothing splines fitted to data on

meat consumption in the United States, 1919–1941.

11


a criterion function which imposes both a penalty for deviating from the datapoints and a penalty for excessive curvature in the segments. The measureof curvature is based upon second derivatives, whilst the measure of deviationis the sum of the squared distances of the points from the curve. A singleparameter λ governs the trade-off between the objectives of smoothness andgoodness of fit.

As an analogy for the smoothing spline, one might think of attaching thedraughtsman’s spline to the pins by springs instead of by clamps. The preciseform of the curve would depend upon the stiffness of the spline and the forcesexerted by the springs. The degree of flexibility of the spline corresponds tothe value of λ. The forces exerted by ordinary springs are proportional to theirextension; and, in this respect, the analogy, which requires the forces to beproportional to the squares of their extensions, is imperfect.

Figure 4 shows the consequences of fitting the smoothing spline to the dataon meat consumption which is also used in Figure 1 where a cubic polynomialhas been fitted. It is a matter of judgment how the value of λ should be chosenso as to reflect the trend.

There are various ways in which the curve of a cubic spline may be ex-trapolated to form forecasts of the trend. In normal circumstances, when theends of the spline are left free, the second derivatives are zero-valued and theextrapolation is linear. However, it is possible to clamp the ends of the splinein a way which imposes a value on their first derivatives. In that case, theextrapolation is quadratic.

Stochastic Trends

It is possible that what is perceived as a trend is the result of the accumu-lation of small stochastic fluctuations which have no systematic basis. In thatcase, there are some clearly defined ways of removing the trend from the dataas well as for extrapolating it into the future.

The simplest model embodying a stochastic trend is the so-called first-order random walk. Let {yt} be the random-walk sequence. Then its value attime t is obtained from the previous value via the equation

(1.18) yt = yt−1 + εt.

Here εt is an element of a white-noise sequence of independently and identicallydistributed random variables with

(1.19) E(εt) = 0 and V (εt) = σ2 for all t.

By a process of back-substitution, the following expression can be derived:

(1.20) yt = y0 +{εt + εt−1 + · · ·+ ε1

}.

12


0

1

2

0

−1

−2

−3

0 25 50 75 100

Figure 5. A sequence generated by a white-noise process.

This depicts yt as the sum of an initial value y0 and of an accumulation ofstochastic increments. If y0 has a fixed finite value, then the mean and thevariance of yt are be given by

(1.21) E(yt) = y0 and V (yt) = t× σ2.

There is no central tendency in the random-walk process; and, if its startingpoint is in the indefinite past rather than at time t = 0, then the mean andvariance are undefined.

To reduce the random walk to a stationary stochastic process, it is neces-sary only to take its first differences. Thus

(1.22) yt − yt−1 = εt.

The values of a random walk, as the name implies, have a tendency towander haphazardly. However, if the variance of the white-noise process issmall, then the values of the stochastic increments will also be small and therandom walk will wander slowly. It is debatable whether the outcome of sucha process deserves to be called a trend.

A first-order random walk over a surface is what is know as Brownianmotion. For a physical example of Brownian motion, one can imagine smallparticles, such a pollen grains, floating on the surface of a viscous liquid. Theviscosity might be expected to bring the particles to a halt quickly if they

13


0

2

4

0

−2

−4

−6

−8

0 25 50 75 100

Figure 6. A first-order random walk

0

50

0

−50

−100

−150

−200

0 25 50 75 100

Figure 7. A second-order random walk

14


were in motion. However, if the particles are very light, then they will darthither and thither on the surface of the liquid under the impact of its moleculeswhich are themselves in constant motion.

There is no better way of predicting the outcome of a random walk thanto take the most recently observed value and to extrapolate it indefinitely intothe future. This is demonstrated by taking the expected values of the elementsof the equation

(1.23) yt+h = yt+h−1 + εt+h

which represents the value which lies h periods ahead at time t. The expecta-tions, which are conditional upon the information of the set It = {yt, yt−1, . . .}containing observations on the series up to time t, may be denoted as follows:

(1.24) E(yt+h|It) =

{yt+h|t, if h > 0;

yt+h, if h ≤ 0.

In these terms, the predictions of the values of the random walk for h > 1periods ahead and for one period ahead are given, respectively, by

(1.25)E(yt+h|It) = yt+h|t = yt+h−1|t,

E(yt+1|It) = yt+1|t = yt.

The first of these, which comes from (23), depends upon the fact thatE(εt+h|It) = 0. The second, which comes from taking expectations in theequation yt+1 = yt + εt+1, uses the fact that the value of yt is already known.The implication of the two equations is that yt serves as the optimal predictorfor all future values of the random walk.

A second-order random walk is formed by accumulating the values of afirst-order process. Thus, if {εt} and {yt} are respectively a white-noise se-quence and the sequence from a first-order random walk, then

(1.26)

zt = zt−1 + yt

= zt−1 + yt−1 + εt

= 2zt−1 − zt−2 + εt

defines the second-order random walk. Here the final expression is obtained bysetting yt−1 = zt−1 − zt−2 in the second expression. It is clear that, to reducethe sequence {zt} to the stationary white-noise sequence, we must take firstdifferences twice in succession.

The nature of a second-order process can be understood by recognisingthat it represents a trend in which the slope—which is its first difference—follows a random walk. If the random walk wanders slowly, then the slope of

15


this trend is liable to change only gradually. Therefore, for extended periods,the second-order random walk may appear to follow a linear time trend.

For a physical analogy of a second-order random walk, we can imagine abody in motion which suffers a series of small impacts. If the kinetic energy ofthe body is large relative to the energy of the impacts, then its linear motion willbe disturbed only slightly. In order to predict where the body might be in somefuture period, we simply extrapolate its linear motion free from disturbances.

To demonstrate that the forecast function for a second-order random walkis a straight line, we may take the expectations, which are conditional upon It,of the elements of the the equation

(1.27) zt+h = 2zt+h−1 − zt+h−2 + εt+h.

For h periods ahead and for one period ahead, this gives

(1.28)E(zt+h|It) = zt+h|t = 2zt+h−1|t − zt+h−2|t,

E(zt+1|It) = zt+1|t = 2zt − zt−1,

which together serve to define a simple iterative scheme. It is straightforwardto confirm that these difference equations have an analytic solution of the form

(1.29) zt+h|t = α+ βh with α = zt and β = zt − zt−1,

which generates a linear time trend.It is possible to define random walks of higher orders. Thus a third-order

random walk is formed by accumulating the values of a second-order process.A third-order process can be expected to give rise to local quadratic trends;and the appropriate way of predicting its values is by quadratic extrapolation.

A stochastic trend of the random-walk variety may be elaborated by theaddition of an irregular component. A simple model consists of a first-orderrandom walk with an added white-noise component. The model is specified bythe equations

(1.30)yt = ξt + ηt,

ξt = ξt−1 + νt,

wherein ηt and νt are generated by two mutually independent white-noise pro-cesses.

The equations combine to give

(1.31)yt − yt−1 = ξt − ξ−1 + ηt − ηt−1

= νt + ηt − ηt−1.

16


The expression on the RHS can be reformulated to give

(1.32) νt + ηt − ηt−1 = εt − µεt−1,

where εt and εt−1 are elements of a white-noise sequence and µ is a parameterof an appropriate value. Thus, the combination of the random walk and whitenoise gives rise to the single equation

(1.33) yt = yt−1 + εt − µεt−1.

The forecast for h steps ahead, which is obtained by taking expectationsin the equation yt+h = yt+h−1 + εt+h − µεt+h−1, is given by

(1.34) E(yt+h|It) = yt+h|t = yt+h−1|t.

The forecast for one step ahead, which is obtained from the equation yt+1 =yt + εt+1 − µεt, is

(1.35)

E(yt+1|It) = yt+1|t = yt − µεt= yt − µ(yt − yt|t−1)

= (1− µ)yt + µyt|t−1.

The result yt|t−1 = yt−1 − µεt−1, which leads to the identity εt = yt − yt|t−1

upon which the second equality of (35) depends, reflects the fact that, if the in-formation at time t−1 consists of the elements of the set It−1 = {yt−1, yt−2, . . .}and the value of µ, then εt−1 is a know quantity which is unaffected by theprocess of taking expectations.

By applying a straightforward process of back-substitution to the finalequation of (35), it will be found that

(1.36)yt+1|t = (1− µ)(yt + µyt−1 + · · ·+ µt−1y1) + µty0

= (1− µ){yt + µyt−1 + µ2yt−2 + · · ·},

where the final expression stands for an infinite series. This is a so-calledexponentially-weighted moving average; and it is the basis of the widely-usedforecasting procedure known as exponential smoothing.

To form the one-step-ahead forecast yt+1|t in the manner indicated by thefirst of the equations under (36), an initial value y0 is required. Equation (34)indicates that all the succeeding forecasts yt+2|t, yt+3|t etc. have the same valueas the one-step-ahead forecast.

It will transpire, in subsequent lectures, that equation (33) is a simpleexample of an Integrated Autoregressive Moving-Average or ARIMA model.There exists a readily accessible general theory of the forecasting of ARIMAprocesses which we shall expound at length.

17


References

Eubank, R.L., (1988), Spline Smoothing and Nonparametric Regression, MarcelDekker Inc. New York.

Hamming, R.W., (1989), Digital Filters: Third Edition, Prentice–Hall Inc.,Englewood Cliffs, N.J.

Ratkowsky, D.L., (1985), Nonlinear Regression Modelling: A Unified Approach,Marcel Dekker Inc. New York.

Reinsch, C.H., (1967), “Smoothing by Spline Functions”, Numerische Mathe-matik, 10, 177–183.

Schoenberg, I.J., (1964), “Spline Functions and the Problem of Graduation”,Proceedings of the National Academy of Science, 52, 947–950.

De Vos, A.F. and I.J. Steyn, (1990), “Stochastic Nonlinearity: A Firm Ba-sis for the Flexible Functional Form”: Research Memorandum 1990-13, VrijeUniversiteit Amsterdam.

18

LECTURE 2

Seasons and Cyclesin Time Series

Cycles of a regular nature are often encountered in physics and engineering.Consider a point moving with constant speed in a circle of radius ρ. The pointmight be the axis of the ‘big end’ of a connecting rod which joins a piston toa flywheel. Let time t be reckoned from an instant when the radius joiningthe point to the centre is at an angle of θ below the horizontal. If the point isprojected onto the horizontal axis, then the distance of the projection from thecentre is given by

(2.1) x = ρ cos(ωt− θ).

The movement of the projection back and forth along the horizontal axis isdescribed as simple harmonic motion.

The parameters of the function are as follows:

ρ is the amplitude,

ω is the angular velocity or frequency and

θ is the phase displacement.

The angular velocity is a measure in radians per unit period. The quantity 2π/ωmeasures the period of the cycle. The phase displacement, also measured inradians, indicates the extent to which the cosine function has been displaced bya shift along the time axis. Thus, instead of the peak of the function occurringat time t = 0, as it would with an ordinary cosine function, it now occurs atime t = θ/ω.

Using the compound-angle formula cos(A−B) = cosA cosB+ sinA sinB,we can rewrite equation (1) as

(2.2)x = ρ cos θ cos(ωt) + ρ sin θ sin(ωt)

= α cos(ωt) + β sin(ωt),

with

(2.3) α = ρ cos θ, β = ρ sin θ and α2 + β2 = ρ2.

19


Extracting a Regular Cyclical Component

A cyclical component which is concealed beneath other motions may beextracted from a data sequence by a straightforward application of the methodof linear regression. An equation may be written in the form of

(2.4) yt = αct(ω) + βst(ω) + et; t = 0, . . . , T − 1,

where ct(ω) = cos(ωt) and st(ω) = sin(ωt). To avoid the need for an interceptterm, the values of the dependent variable should be deviations about a meanvalue. In matrix terms, equation (4) becomes

(2.5) y = [ c s ]

[αβ

]+ e,

where c = [c0, . . . , cT−1]′ and s = [s0, . . . , sT−1]′ and e = [e0, . . . , eT−1]′ arevectors of T elements. The parameters α, β can be found by running regressionsfor a wide range of values of ω and by selecting the regression which deliversthe lowest value for the residual sum of squares.

Such a technique may be used for extracting a seasonal component froman economic time series; and, in that case, we know in advance what valueto give to ω. For the seasonality of economic activities is related, ultimately,to the near-perfect regularities of the solar system which are reflected in theannual calender.

It may be unreasonable to expect that an idealised seasonal cycle can berepresented by a simple sinusoidal function. However, wave forms of a morecomplicated nature may be synthesised by employing a series of sine and cosinefunctions whose frequencies are integer multiples of the fundamental seasonalfrequency. If there are s = 2n observations per annum, then a general modelfor a seasonal fluctuation would comprise the frequencies

(2.6) ωj =2πj

s, j = 0, . . . , n =

s

2,

which are equally spaced in the interval [0, π]. Such a series of frequencies isdescribed as an harmonic scale.

A model of seasonal fluctuation comprising the full set of harmonically-related frequencies would take the form of

(2.7) yt =n∑j=0

{αj cos(ωjt) + βj sin(ωjt)

}+ et,

where et is a residual element which might represent an irregular white-noisecomponent in the process underlying the data.

20

D.S.G. POLLOCK : SEASONS AND CYCLES

1 2 3 4

−1

−0.5

0.5

1

1 2 3 4

−1

−0.5

0.5

1

1 2 3 4

−1

−0.5

0.5

1

1 2 3 4

−1

−0.5

0.5

1

Figure 1. Trigonometrical functions, of frequencies ω1 = π/2 and

ω2 = π, associated with a quarterly model of a seasonal fluctuation.

At first sight, it appears that there are s + 2 components in the sum.However, when s is even, we have

(2.8)

sin(ω0t) = sin(0) = 0,

cos(ω0t) = cos(0) = 1,

sin(ωnt) = sin(πt) = 0,

cos(ωnt) = cos(πt) = (−1)t.

Therefore there are only s nonzero coefficients to be determined.This simple seasonal model is illustrated adequately by the case of quar-

terly data. Matters are no more complicated in the case of monthly data. Whenthere are four observations per annum, we have ω0 = 0, ω1 = π/2 and ω2 = π;and equation (7) assumes the form of

(2.9) yt = α0 + α1 cos(πt

2

)+ β1 sin

(πt2

)+ α2(−1)t + et.

If the four seasons are indexed by j = 0, . . . , 3, then the values from theyear τ can be represented by the following matrix equation:

(2.10)

yτ0

yτ1

yτ2

yτ3

=

1 1 0 11 0 1 −11 −1 0 11 0 −1 −1

α0

α1

β1

α2

+

eτ0

eτ1

eτ2

eτ3

.21


It will be observed that the vectors of the matrix are mutually orthogonal.When the data consist of T = 4p observations which span p years, the

coefficients of the equation are given by

(2.11)

α0 =1

T

T−1∑t=0

yt,

α1 =2

T

p∑τ=1

(yτ0 − yτ2),

β1 =2

T

p∑τ=1

(yτ1 − yτ3),

α2 =1

T

p∑τ=1

(yτ0 − yτ1 + yτ2 − yτ3).

It is the mutual orthogonality of the vectors of ‘explanatory’ variables whichaccounts for the simplicity of these formulae.

An alternative model of seasonality, which is used more often by econome-tricians, assigns an individual dummy variable to each season. Thus, in placeof equation (10), we may take

(2.12)

yτ0

yτ1

yτ2

yτ3

=

1 0 0 00 1 0 00 0 1 00 0 0 1

δ0δ1δ2δ3

+

eτ0

eτ1

eτ2

eτ3

,where

(2.13) δj =4

T

p∑τ=1

yτj , for j = 0, . . . , 3.

A comparison of equations (10) and (12) establishes the mapping from thecoefficients of the trigonometrical functions to the coefficients of the dummyvariables. The inverse mapping is

(2.14)

α0

α1

β1

α2

=

14

14

14

14

12 0 − 1

2 0

0 12 0 − 1

214 − 1

414 − 1

4

δ0

δ1

δ2

δ3

.Another way of parametrising the model of seasonality is to adopt the

following form:

(2.15)

yτ0

yτ1

yτ2

yτ3

=

1 1 0 01 0 1 01 0 0 11 0 0 0

φγ0

γ1

γ2

+

eτ0

eτ1

eτ2

eτ3

.22


This scheme is unbalanced in that it does not treat each season in the samemanner. An attempt might be made to correct this feature by adding to thematrix an extra column with a unit at the bottom and with zeros elsewhere andby introducing an accompanying parameter γ3. However, the columns of theresulting matrix will be linearly dependent; and this will make the parametersindeterminate unless an additional constraint is imposed which sets γ0 + · · ·+γ3 = 0.

The problem highlights a difficulty which might arise if either of theschemes under (10) or (12) were fitted to the data by multiple regression inthe company of a polynomial φ(t) = φ0 + φ1t+ · · ·+ φpt

p designed to capturea trend. To make such a regression viable, one would have to eliminate theintercept parameter φ0.

Irregular Cycles

Whereas it seems reasonable to model a seasonal fluctuation in terms oftrigonometrical functions, it is difficult to accept that other cycles in economicactivity should have such regularity.

A classic expression of skepticism was made by Slutsky [19] in a famousarticle of 1927:

Suppose we are inclined to believe in the reality of the strict period-icity of the business cycle, such, for example, as the eight-year periodpostulated by Moore. Then we should encounter another difficulty.Wherein lies the source of this regularity? What is the mechanism ofcausality which, decade after decade, reproduces the same sinusoidalwave which rises and falls on the surface of the social ocean with theregularity of day and night?

It seems that something other than a perfectly regular sinusoidal componentis required to model the secular fluctuations of economic activity which aredescribed as business cycles.

To obtain a model for a seasonal fluctuation, it has been enough to modifythe equation of harmonic motion by superimposing a disturbance term whichaffects the amplitude. To generate a cycle which is more fundamentally affectedby randomness, we must construct a model which has random effects in boththe phase and the amplitude.

To begin, let us imagine, once more, a point on the circumference of a circleof radius ρ which is travelling with an angular velocity of ω. At the instantt = 0, when the point makes a positive angle of θ with the horizontal axis, thecoordinates are given by

(2.16) (α, β) = (ρ cos θ, ρ sin θ).

23


To find the coordinates of the point after it has rotated through an angle of ωin one period of time, we may rotate the component vectors (α, 0) and (0, β)separately and add them. The rotation of the components is depicted as follows:

(2.17)(α, 0)

ω−→ (α cosω, α sinω),

(0, β)ω−→ (−β sinω, β cosω).

Their addition gives

(2.18) (α, β)ω−→ (y, z) = (α cosω − β sinω, α sinω + β cosω).

In matrix terms, the transformation becomes

(2.19)

[yz

]=

[cosω − sinωsinω cosω

] [αβ

].

To find the values of the coordinates at a time which is an integral number ofperiods ahead, we may transform the vector [y′, z′]′ by premultiplying it theappropriate number of times by the matrix of the rotation. Alternatively, wemay replace ω in equation (19) by whatever angle will be reached at the timein question. In effect, equation (19) specifies the horizontal and vertical com-ponents of a circular motion which amount to a pair of synchronous harmonicmotions.

To introduce the appropriate irregularities into the motion, we may add arandom disturbance term to each of its components. The discrete-time equationof the resulting motion may be expressed as follows:

(2.20)

[ytzt

]=


] [yt−1

zt−1

]+

[υtζt

].

Now the character of the motion is radically altered. There is no longer anybound on the amplitudes which the components might acquire in the longrun; and there is, likewise, a tendency for the phases of their cycles to driftwithout limit. Nevertheless, in the absence of uncommonly large disturbances,the trajectories of y and z are liable, in a limited period, to resemble those ofthe simple harmonic motions.

It is easy to decouple the equations of y and z. The first of the equationswithin the matrix expression can be written as

(2.21) yt = cyt−1 − szt−1 + υt.

The second equation may be lagged by one period and rearranged to give

(2.22) zt−1 − czt−2 = syt−2 + ζt−1.

24


By taking the first difference of equation (21) and by using equation (22) toeliminate the values of z, we get

(2.23)yt − cyt−1 = cyt−1 − c2yt−2 − szt−1 + cszt−2 + υt − cυt−1

= cyt−1 − c2yt−2 − s2yt−2 − sζt−1 + υt − cυt−1.

If we use the result that yt−2 cos2 +yt−2 sin2 = yt−2 and if we collect the dis-turbances to form a new variable εt = υt−sζt−1−cυt−1, then we can rearrangethe second equality to give

(2.24) yt = 2 cosωyt−1 − yt−2 + εt.

Here it is not true in general that the sequence of disturbances {εt} will bewhite noise. However, if we specify that, within equation (20),

(2.25)

[υtζt

]=

[− sinωcosω

]ηt,

where {ηt} is a white-noise sequence, then the lagged terms within εt will cancelleaving a sequence whose elements are mutually uncorrelated.

A sequence generated by equation (24) when {εt} is a white-noise sequenceis depicted in Figure 2.

0

10

20

30

40

0

−10

−20

−30

−40

0 25 50 75 100

Figure 2. A quasi-cyclical sequence generated by the

equation yt = 2 cosωyt−1 − yt−2 + εt when ω = 20◦.

25


It is interesting to recognise that equation (24) becomes the equation of asecond-order random walk in the case where ω = 0. The second-order randomwalk gives rise to trends which can remain virtually linear over considerableperiods.

Whereas there is little difficulty in understanding that an accumulation ofpurely random disturbances can give rise to linear trend, there is often surpriseat the fact that such disturbances can also generate cycles which are more orless regular. An understanding of this phenomenon can be reached by con-sidering a physical analogy. One such analogy, which is a very apposite, wasprovided by Yule whose article of 1927 introduced the concept of a second-orderautoregressive process of which equation (24) is a limiting case. Yules’s purposewas to explain, in terms of random causes, a cycle of roughly 11 years whichcharacterises the Wolfer sunspot index.

Yule invited his readers to imagine a pendulum attached to a recording de-vice. Any deviations from perfectly harmonic motion which might be recordedmust be the result of superimposed errors of observation which could be allbut eliminated if a long sequence of observations were subjected to a regressionanalysis.

The recording apparatus is left to itself and unfortunately boys getinto the room and start pelting the pendulum with peas, sometimesfrom one side and sometimes from the other. The motion is nowaffected not by superposed fluctuations but by true disturbances, andthe effect on the graph will be of an entirely different kind. The graphwill remain surprisingly smooth, but amplitude and phase will varycontinuously.

The phenomenon described by Yule is due to the inertia of the pendulum.In the short term, the impacts of the peas impart very little energy to thesystem compared with the sum of its kinetic and potential energies at any pointin time. However, on taking a longer view, we can see that, in the absence ofclock weights, the system is driven by the impacts alone.

The Fourier Decomposition of a Time Series

In spite of the notion that a regular trigonometrical function is an inappro-priate means for modelling an economic cycle other than a seasonal fluctuation,there are good reasons to persist with the business of explaining a data sequencein terms of such functions.

The Fourier decomposition of a series is a matter of explaining the seriesentirely as a composition of sinusoidal functions. Thus it is possible to representthe generic element of the sample as

(2.26) yt =n∑j=0


}.

26


Assuming that T = 2n is even, this sum comprises T functions whose frequen-cies

(2.27) ωj =2πj

T, j = 0, . . . , n =

T

2

are at equally spaced points in the interval [0, π].As we might infer from our analysis of a seasonal fluctuation, there are

as many nonzeros elements in the sum under (26) as there are data points,for the reason that two of the functions within the sum—namely sin(ω0t) =sin(0) and sin(ωnt) = sin(πt)—are identically zero. It follows that the mappingfrom the sample values to the coefficients constitutes a one-to-one invertibletransformation. The same conclusion arises in the slightly more complicatedcase where T is odd.

The angular velocity ωj = 2πj/T relates to a pair of trigonometrical com-ponents which accomplish j cycles in the T periods spanned by the data. Thehighest velocity ωn = π corresponds to the so-called Nyquist frequency. If acomponent with a frequency in excess of π were included in the sum in (26),then its effect would be indistinguishable from that of a component with afrequency in the range [0, π]

To demonstrate this, consider the case of a pure cosine wave of unit am-plitude and zero phase whose frequency ω lies in the interval π < ω < 2π. Letω∗ = 2π − ω. Then

(2.28)

cos(ωt) = cos{

(2π − ω∗)t}

= cos(2π) cos(ω∗t) + sin(2π) sin(ω∗t)

= cos(ω∗t);

which indicates that ω and ω∗ are observationally indistinguishable. Here,ω∗ ∈ [0, π] is described as the alias of ω > π.

For an illustration of the problem of aliasing, let us imagine that a personobserves the sea level at 6am. and 6pm. each day. He should notice a verygradual recession and advance of the water level; the frequency of the cyclebeing f = 1/28 which amounts to one tide in 14 days. In fact, the true frequencyis f = 1− 1/28 which gives 27 tides in 14 days. Observing the sea level everysix hours should enable him to infer the correct frequency.

Calculation of the Fourier Coefficients

For heuristic purposes, we can imagine calculating the Fourier coefficientsusing an ordinary regression procedure to fit equation (26) to the data. Inthis case, there would be no regression residuals, for the reason that we are‘estimating’ a total of T coefficients from T data points; so we are actuallysolving a set of T linear equations in T unknowns.

27


A reason for not using a multiple regression procedure is that, in this case,the vectors of ‘explanatory’ variables are mutually orthogonal. Therefore Tapplications of a univariate regression procedure would be appropriate to ourpurpose.

Let cj = [c0j , . . . , cT−1,j ]′ and sj = [s0,j , . . . , sT−1,j ]

′ represent vectors ofT values of the generic functions cos(ωjt) and sin(ωjt) respectively. Then thereare the following orthogonality conditions:

(2.29)

c′icj = 0 if i 6= j,

s′isj = 0 if i 6= j,

c′isj = 0 for all i, j.

In addition, there are the following sums of squares:

(2.30)

c′0c0 = c′ncn = T,

s′0s0 = s′nsn = 0,

c′jcj = s′jsj =T

2.

The ‘regression’ formulae for the Fourier coefficients are therefore

(2.31) α0 = (i′i)−1i′y =1

T

∑t

yt = y,

(2.32) αj = (c′jcj)−1c′jy =

2

T

∑t

yt cosωit,

(2.33) βj = (s′jsj)−1s′jy =

2

T

∑t

yt sinωjt.

By pursuing the analogy of multiple regression, we can understand thatthere is a complete decomposition of the sum of squares of the elements of ywhich is given by

(2.34) y′y = α20i′i+

∑j

α2jc′jcj +

∑j

β2j s′jsj .

Now consider writing α20i′i = y2i′i = y′y where y′ = [y, . . . , y] is the vector

whose repeated element is the sample mean y. It follows that y′y − α20i′i =

y′y − y′y = (y − y)′(y − y). Therefore we can rewrite the equation as

(2.35) (y − y)′(y − y) =T

2

∑j

{α2j + β2

j

}=T

2

∑j

ρ2j ,

28


and it follows that we can express the variance of the sample as

(2.36)

1

T

T−1∑t=0

(yt − y)2 =1

2

n∑j=1

(α2j + β2

j )

=2

T 2

∑j

{(∑t

yt cosωjt

)2

+

(∑t

yt sinωjt

)2}.

The proportion of the variance which is attributable to the component at fre-quency ωj is (α2

j + β2j )/2 = ρ2

j/2, where ρj is the amplitude of the component.

The number of the Fourier frequencies increases at the same rate as thesample size T . Therefore, if the variance of the sample remains finite, andif there are no regular harmonic components in the process generating thedata, then we can expect the proportion of the variance attributed to theindividual frequencies to decline as the sample size increases. If there is sucha regular component within the process, then we can expect the proportion ofthe variance attributable to it to converge to a finite value as the sample sizeincreases.

In order provide a graphical representation of the decomposition of thesample variance, we must scale the elements of equation (36) by a factor of T .The graph of the function I(ωj) = (T/2)(α2

j +β2j ) is know as the periodogram.

10

20

30

40

0 π/4 π/2 3π/4 π

Figure 3. The periodogram of Wolfer’s Sunspot Numbers 1749–1924.

29


There are many impressive examples where the estimation of the peri-odogram has revealed the presence of regular harmonic components in a dataseries which might otherwise have passed undetected. One of the best-knowexamples concerns the analysis of the brightness or magnitude of the star T.Ursa Major. It was shown by Whittaker and Robinson in 1924 that this seriescould be described almost completely in terms of two trigonometrical functionswith periods of 24 and 29 days.

The attempts to discover underlying components in economic time-serieshave been less successful. One application of periodogram analysis which was anotorious failure was its use by William Beveridge in 1921 and 1923 to analysea long series of European wheat prices. The periodogram had so many peaksthat at least twenty possible hidden periodicities could be picked out, and thisseemed to be many more than could be accounted for by plausible explanationswithin the realms of economic history.

Such findings seem to diminish the importance of periodogram analysisin econometrics. However, the fundamental importance of the periodogram isestablished once it is recognised that it represents nothing less than the Fouriertransform of the sequence of empirical autocovariances.

The Empirical Autocovariances

A natural way of representing the serial dependence of the elements of adata sequence is to estimate their autocovariances. The empirical autocovari-ance of lag τ is defined by the formula

(2.37) cτ =1

T

T−1∑t=τ

(yt − y)(yt−τ − y).

The empirical autocorrelation of lag τ is defined by rτ = cτ/c0 where c0, whichis formally the autocovariance of lag 0, is the variance of the sequence. Theautocorrelation provides a measure of the relatedness of data points separatedby τ periods which is independent of the units of measurement.

It is straightforward to establish the relationship between the periodogramand the sequence of autocovariances.

The periodogram may be written as

(2.38) I(ωj) =2

T

[{ T−1∑t=0

cos(ωjt)(yt − y)

}2

+

{ T−1∑t=0

sin(ωjt)(yt − y)

}2].

The identity∑t cos(ωjt)(yt− y) =

∑t cos(ωjt)yt follows from the fact that, by

30


construction,∑t cos(ωjt) = 0 for all j. Expanding the expression in (38) gives

(2.39)

I(ωj) =2

T

{∑t

∑s

cos(ωjt) cos(ωjs)(yt − y)(ys − y)

}+

2

T

{∑t

∑s

sin(ωjt) sin(ωjs)(yt − y)(ys − y)

},

and, by using the identity cos(A) cos(B) + sin(A) sin(B) = cos(A−B), we canrewrite this as

(2.40) I(ωj) =2

T

{∑t

∑s

cos(ωj [t− s])(yt − y)(ys − y)

}.

Next, on defining τ = t − s and writing cτ =∑t(yt − y)(yt−τ − y)/T , we can

reduce the latter expression to

(2.41) I(ωj) = 2T−1∑

τ=1−Tcos(ωjτ)cτ ,

which is a Fourier transform of the sequence of empirical autocovariances.

An Appendix on Harmonic Cycles

Lemma 1. Let ωj = 2πj/T where j ∈ {0, 1, . . . , T/2} if T is even and j ∈{0, 1, . . . , (T − 1)/2} if T is odd. Then

T−1∑t=0

cos(ωjt) =T−1∑t=0

sin(ωjt) = 0.

Proof. By Euler’s equations, we have

T−1∑t=0

cos(ωjt) =1

2

T−1∑t=0

exp(i2πjt/T ) +1

2

T−1∑t=0

exp(−i2πjt/T ).

By using the formula 1 + λ+ · · ·+ λT−1 = (1− λT )/(1− λ), we find that

T−1∑t=0

exp(i2πjt/T ) =1− exp(i2πj)

1− exp(i2πj/T ).

But exp(i2πj) = cos(2πj) + i sin(2πj) = 1, so the numerator in the expressionabove is zero, and hence

∑t exp(i2πj/T ) = 0. By similar means, we can show

31


that∑t exp(−i2πj/T ) = 0; and, therefore, it follows that

∑t cos(ωjt) = 0. An

analogous proof shows that∑t sin(ωjt) = 0.

Lemma 2. Let ωj = 2πj/T where j ∈ 0, 1, . . . , T/2 if T is even and j ∈0, 1, . . . , (T − 1)/2 if T is odd. Then

(a)T−1∑t=0

cos(ωjt) cos(ωkt) =

{0, if j 6= k;

T2 , if j = k.

(b)T−1∑t=0

sin(ωjt) sin(ωkt) =

{0, if j 6= k;

T2 , if j = k.

(c)T−1∑t=0

cos(ωjt) sin(ψkt) = 0 ifj 6= k.

Proof. From the formula cosA cosB = 12{cos(A+B) + cos(A−B)} we have

T−1∑t=0

cos(ωjt) cos(ωkt) =1

2

∑{cos([ωj + ωk]t) + cos([ωj − ψk]t)}

=1

2

T−1∑t=0

{cos(2π[j + k]t/T ) + cos(2π[j − k]t/T )} .

We find, in consequence of Lemma 1, that if j 6= k, then both terms on the RHSvanish, and thus we have the first part of (a). If j = k, then cos(2π[j−k]t/T ) =cos 0 = 1 and so, whilst the first term vanishes, the second terms yields thevalue of T under summation. This gives the second part of (a).

The proofs of (b) and (c) follow along similar lines.

References

Beveridge, Sir W. H., (1921), “Weather and Harvest Cycles.” Economic Jour-nal, 31, 429–452.

Beveridge, Sir W. H., (1922), “Wheat Prices and Rainfall in Western Europe.”Journal of the Royal Statistical Society, 85, 412–478.

Moore, H. L., (1914), “Economic Cycles: Their Laws and Cause.” Macmillan:New York.

Slutsky, E., (1937), “The Summation of Random Causes as the Source of Cycli-cal Processes.” Econometrica, 5, 105–146.

Yule, G. U., (1927), “On a Method of Investigating Periodicities in DisturbedSeries with Special Reference to Wolfer’s Sunspot Numbers.” PhilosophicalTransactions of the Royal Society, 89, 1–64.

32

LECTURE 3

Models and Methodsof Time-Series Analysis

A time-series model is one which postulates a relationship amongst a num-ber of temporal sequences or time series. An example is provided by the simpleregression model

(3.1) y(t) = x(t)β + ε(t),

where y(t) = {yt; t = 0,±1,±2, . . .} is a sequence, indexed by the time subscriptt, which is a combination of an observable signal sequence x(t) = {xt} and anunobservable white-noise sequence ε(t) = {εt} of independently and identicallydistributed random variables.

A more general model, which we shall call the general temporal regressionmodel, is one which postulates a relationship comprising any number of con-secutive elements of x(t), y(t) and ε(t). The model may be represented by theequation

(3.2)

p∑i=0

αiy(t− i) =

k∑i=0

βix(t− i) +

q∑i=0

µiε(t− i),

where it is usually taken for granted that α0 = 1. This normalisation of theleading coefficient on the LHS identifies y(t) as the output sequence. Any ofthe sums in the equation can be infinite, but if the model is to be viable, thesequences of coefficients {αi}, {βi} and {µi} can depend on only a limitednumber of parameters.

Although it is convenient to write the general model in the form of (2), itis also common to represent it by the equation

(3.3) y(t) =

p∑i=1

φiy(t− i) +

k∑i=0

βix(t− i) +

q∑i=0

µiε(t− i),

where φi = −αi for i = 1, . . . , p. This places the lagged versions of the se-quence y(t) on the RHS in the company of the input sequence x(t) and its lags.

33


Whereas engineers are liable to describe this as a feedback model, economistsare more likely to describe it as a model with lagged dependent variables.

The foregoing models are termed regression models by virtue of the in-clusion of the observable explanatory sequence x(t). When x(t) is deleted, weobtain a simpler unconditional linear stochastic model:

(3.4)

p∑i=0

αiy(t− i) =

q∑i=0

µiε(t− i).

This is the autoregressive moving-average (ARMA) model.A time-series model can often assume a variety of forms. Consider a simple

dynamic regression model of the form

(3.5) y(t) = φy(t− 1) + x(t)β + ε(t),

where there is a single lagged dependent variable. By repeated substitution,we obtain

(3.6)

y(t) = φy(t− 1) + βx(t) + ε(t)

= φ2y(t− 2) + β{x(t) + φx(t− 1)

}+ ε(t) + φε(t− 1)

...

= φny(t− n) + β{x(t) + φx(t− 1) + · · ·+ φn−1x(t− n+ 1)

}+ ε(t) + φε(t− 1) + · · ·+ φn−1ε(t− n+ 1).

If |φ| < 1, then lim(n → ∞)φn = 0; and it follows that, if x(t) and ε(t) arebounded sequences, then, as the number of repeated substitutions increasesindefinitely, the equation will tend to the limiting form of

(3.7) y(t) = β

∞∑i=0

φix(t− i) +∞∑i=0

φiε(t− i).

It is notable that, by this process of repeated substitution, the feedbackstructure has been eliminated from the model. As a result, it becomes easierto assess the impact upon the output sequence of changes in the values of theinput sequence. The direct mapping from the input sequence to the outputsequence is described by engineers as a transfer function or as a filter.

For models more complicated than the one above, the method of repeatedsubstitution, if pursued directly, becomes intractable. Thus we are motivatedto use more powerful algebraic methods to effect the transformation of theequation. This leads us to consider the use of the so-called lag operator. Aproper understanding of the lag operator depends upon a knowledge of thealgebra of polynomials and of rational functions.

34

D.S.G. POLLOCK : MODELS AND METHODS

The Algebra of the Lag Operator

A sequence x(t) = {xt; t = 0,±1,±2, . . .} is any function mapping fromthe set of integers Z = {0,±1,±2, . . .} to the real line. If the set of integersrepresents a set of dates separated by unit intervals, then x(t) is described asa temporal sequence or a time series.

The set of all time series represents a vector space, and various lineartransformations or operators can be defined over the space. The simplest ofthese is the lag operator L which is defined by

(3.8) Lx(t) = x(t− 1).

Now, L{Lx(t)} = Lx(t − 1) = x(t − 2); so it makes sense to define L2 byL2x(t) = x(t− 2). More generally, Lkx(t) = x(t− k) and, likewise, L−kx(t) =x(t+ k). Other operators are the difference operator ∇ = I −L which has theeffect that

(3.9) ∇x(t) = x(t)− x(t− 1),

the forward-difference operator ∆ = L−1 − I, and the summation operatorS = (I − L)−1 = {I + L+ L2 + · · ·} which has the effect that

(3.10) Sx(t) =∞∑i=0

x(t− i).

In general, we can define polynomials of the lag operator of the form p(L) =p0 + p1L+ · · ·+ pnL

n =∑piL

i having the effect that

(3.11)

p(L)x(t) = p0x(t) + p1x(t− 1) + · · ·+ pnx(t− n)

=n∑i=0

pix(t− i).

In these terms, the equation under (2) of the general temporal model becomes

(3.12) α(L)y(t) = β(L)x(t) + µ(L)ε(t).

The advantage which comes from defining polynomials in the lag operatorstems from the fact that they are isomorphic to the set of ordinary algebraicpolynomials. Thus we can rely upon what we know about ordinary polynomialsto treat problems concerning lag-operator polynomials.

35


Algebraic Polynomials

Consider the equation φ0 + φ1z + φ2z2 = 0. Once the equation has been

divided by φ2, it can be factorised as (z − λ1)(z − λ2) where λ1, λ2 are theroots or zeros of the equation which are given by the formula

(3.13) λ =−φ1 ±

√φ2

1 − 4φ2φ0

2φ2.

If φ21 ≥ 4φ2φ0, then the roots λ1, λ2 are real. If φ2

1 = 4φ2φ0, then λ1 = λ2.If φ2

1 < 4φ2φ0, then the roots are the conjugate complex numbers λ = α + iβ,λ∗ = α− iβ, where i =

√−1.

There are three alternative ways of representing the conjugate complexnumbers λ and λ∗ :

(3.14)λ = α+ iβ = ρ(cos θ + i sin θ) = ρeiθ,

λ∗ = α− iβ = ρ(cos θ − i sin θ) = ρe−iθ,

where

(3.15) ρ =√α2 + β2 and θ = tan−1

(β

α

).

These are called, respectively, the Cartesian form, the trigonometrical form andthe exponential form.

The Cartesian and trigonometrical representations are understood by con-sidering the Argand diagram:

ρ

α

β

θ

−θ

λ

λ*

Re

Im

Figure 1. The Argand Diagram showing a complex

number λ = α+ iβ and its conjugate λ∗ = α− iβ.

36


The exponential form is understood by considering the following seriesexpansions of cos θ and i sin θ about the point θ = 0:

(3.16)

cos θ ={

1− θ2

2!+θ4

4!− θ6

6!+ · · ·

},

i sin θ ={iθ − iθ3

3!+iθ5

5!− iθ7

7!+ · · ·

}.

Adding these gives

(3.17)cos θ + i sin θ =

{1 + iθ − θ2

2!− iθ3

3!+θ4

4!+ · · ·

}= eiθ.

Likewise, by subtraction, we get

(3.18)cos θ − i sin θ =

{1− iθ − θ2

2!+iθ3

3!+θ4

4!− · · ·

}= e−iθ.

These are Euler’s equations. It follows from adding (17) and (18) that

(3.19) cos θ =eiθ + e−iθ

2.

Subtracting (18) from (17) gives

(3.20)sin θ =

−i2

(eiθ − e−iθ)

=1

2i(eiθ − e−iθ).

Now consider the general equation of the nth order:

(3.21) φ0 + φ1z + φ2z2 + · · ·+ φnz

n = 0.

On dividing by φn, we can factorise this as

(3.22) (z − λ1)(z − λ2) · · · (z − λn) = 0,

where some of the roots may be real and others may be complex. The complexroots come in conjugate pairs, so that, if λ = α + iβ is a complex root, thenthere is a corresponding root λ∗ = α−iβ such that the product (z−λ)(z−λ∗) =z2 + 2αz + (α2 + β2) is real and quadratic. When we multiply the n factorstogether, we obtain the expansion

(3.23) 0 = zn −∑i

λizn−1 +

∑i

∑j

λiλjzn−2 − · · · (−1)nλ1λ2 · · ·λn.

37


This can be compared with the expression (φ0/φn)+(φ1/φn)z+ · · ·+zn =0. By equating coefficients of the two expressions, we find that (φ0/φn) =(−1)n

∏λi or, equivalently,

(3.24) φn = φ0

n∏i=1

(−λi)−1.

Thus we can express the polynomial in any of the following forms:

(3.25)

∑φiz

i = φn∏

(z − λi)

= φ0

∏(−λi)−1

∏(z − λi)

= φ0

∏(1− z

λi

).

We should also note that, if λ is a root of the primary equation∑φiz

i = 0,where rising powers of z are associated with rising indices on the coefficients,then µ = 1/λ is a root of the equation

∑φiz

n−i = 0, which has decliningpowers of z instead. This follows since

∑φiλ

i =∑φiµ−i = 0 implies that

µn∑φiµ−i =

∑φiµ

n−i = 0. Confusion can arise from not knowing which ofthe two equations one is dealing with.

Rational Functions of Polynomials

If δ(z) and γ(z) are polynomial functions of z of degrees d and g respec-tively with d < g, then the ratio δ(z)/γ(z) is described as a proper rationalfunction. We shall often encounter expressions of the form

(3.26) y(t) =δ(L)

γ(L)x(t).

For this to have a meaningful interpretation in the context of a time-seriesmodel, we normally require that y(t) should be a bounded sequence wheneverx(t) is bounded. The necessary and sufficient condition for the boundedness ofy(t), in that case, is that the series expansion of δ(z)/γ(z) should be convergentwhenever |z| ≤ 1. We can determine whether or not the sequence will convergeby expressing the ratio δ(z)/γ(z) as a sum of partial fractions. The basic resultis as follows:

(3.27) If δ(z)/γ(z) = δ(z)/{γ1(z)γ2(z)} is a proper rational function, andif γ1(z) and γ2(z) have no common factor, then the function canbe uniquely expressed as

δ(z)

γ(z)=δ1(z)

γ1(z)+δ2(z)

γ2(z),

where δ1(z)/γ1(z) and δ2(z)/γ2(z) are proper rational functions.

38


Imagine that γ(z) =∏

(1−z/λi). Then repeated applications of this basicresult enables us to write

(3.28)δ(z)

γ(z)=

κ1

1− z/λ1+

κ2

1− z/λ2+ · · ·+ κg

1− z/λg.

By adding the terms on the RHS, we find an expression with a numerator ofdegree n− 1. By equating the terms of the numerator with the terms of δ(z),we can find the values κ1, κ2, . . . , κg. The convergence of the expansion ofδ(z)/γ(z) is a straightforward matter. For the series converges if and only ifthe expansion of each of the partial fractions converges. For the expansion

(3.29)κ

1− z/λ = κ{

1 + z/λ+ (z/λ)2 + · · ·}

to converge when |z| ≤ 1, it is necessary and sufficient that |λ| > 1.

Example. Consider the function

(3.30)

3z

1 + z − 2z2=

3z

(1− z)(1 + 2z)

=κ1

1− z +κ2

1 + 2z

=κ1(1 + 2z) + κ2(1− z)

(1− z)(1 + 2z).

Equating the terms of the numerator gives

(3.31) 3z = (2κ1 − κ2)z + (κ1 + κ2),

so κ2 = −κ1, which gives 3 = (2κ1 − κ2) = 3κ1; and thus we have κ1 = 1,κ2 = −1.

Linear Difference Equations

An nth-order linear difference equation is a relationship amongst n + 1consecutive elements of a sequence x(t) of the form

(3.32) α0x(t) + α1x(t− 1) + · · ·+ αnx(t− n) = u(t),

where u(t) is some specified sequence which is described as the forcing function.The equation can be written, in a summary notation, as

(3.33) α(L)x(t) = u(t),

39


where α(L) = α0 +α1L+ · · ·+αnLn. If n consecutive values of x(t) are given,

say x1, x2, . . . , xn, then the relationship can be used to find the succeeding valuexn+1. In this way, so long as u(t) is fully specified, it is possible to generate anynumber of the succeeding elements of the sequence. The values of the sequenceprior to t = 1 can be generated likewise; and thus, in effect, we can deducethe function x(t) from the difference equation. However, instead of a recursivesolution, we often seek an analytic expression for x(t).

The function x(t; c), expressing the analytic solution, will comprise a setof n constants in c = [c1, c2, . . . , cn]′ which can be determined once we aregiven a set of n consecutive values of x(t) which are called initial conditions.The general analytic solution of the equation α(L)x(t) = u(t) is expressed asx(t; c) = y(t; c) + z(t), where y(t) is the general solution of the homogeneousequation α(L)y(t) = 0, and z(t) = α−1(L)u(t) is called a particular solution ofthe inhomogeneous equation.

We may solve the difference equation in three steps. First, we find thegeneral solution of the homogeneous equation. Next, we find the particularsolution z(t) which embodies no unknown quantities. Finally, we use the ninitial values of x to determine the constants c1, c2, . . . , cn. We shall discuss indetail only the solution of the homogeneous equation.

Solution of the Homogeneous Difference Equation

If λj is a root of the equation α(z) = α0 +α1z + · · ·+αnzn = 0 such that

α(λj) = 0, then yj(t) = (1/λj)t is a solution of the equation α(L)y(t) = 0.

This can be see this by considering the expression

(3.34)

α(L)

(1

λj

)t=(α0 + α1L+ · · ·+ αnL

n)( 1

λj

)t= α0

(1

λj

)t+ α1

(1

λj

)t−1

+ · · ·+ αn

(1

λj

)t−n=(α0 + α1λj + · · ·+ αnλ

nj

)( 1

λj

)t= α(λj)

(1

λj

)t.

Alternatively, one may consider the factorisation α(L) = α0

∏i(1 − L/λi).

Within this product is the term 1− L/λj ; and since(1− L

λj

)(1

λj

)t=

(1

λj

)t−(

1

λj

)t= 0,

it follows that α(L)(1/λj)t = 0.

40


The general solution, in the case where α(L) = 0 has distinct real roots, isgiven by

(3.35) y(t; c) = c1

(1

λ1

)t+ c2

(1

λ2

)t+ · · ·+ cn

(1

λn

)t,

where c1, c2, . . . , cn are the constants which are determined by the initial con-ditions.

In the case where two roots coincide at a value of λj , the equation α(L)y(t)= 0 has the solutions y1(t) = (1/λj)

t and y2(t) = t(1/λj)t. To show this, let us

extract the term (1 − L/λj)2 from the factorisation α(L) = α0

∏i(1 − L/λi).

Then, according to the previous argument, we have (1 − L/λj)2(1/λj)t = 0,

but, also, we have

(3.36)

(1− L

λj

)2

t

(1

λj

)t=

(1− 2L

λj+L2

λ2j

)t

(1

λj

)t= t

(1

λj

)t− 2(t− 1)

(1

λj

)t+ (t− 2)

(1

λj

)t= 0.

In general, if there are r repeated roots with a value of λj , then all of (1/λj)t,

t(1/λj)t, t2(1/λj)

t, . . . , tr−1(1/λj)t are solutions to the equation α(L)y(t) = 0.

A particularly important special case arises when there are r repeatedroots of unit value. Then the functions 1, t, t2, . . . , tr−1 are all solutions to thehomogeneous equation. With each solution is associated a coefficient whichcan be determined in view of the initial conditions. If these coefficients ared0, d1, d2, . . . , dr−1 then, within the general solution of the homogeneous equa-tion, there will be found the term d0+d1t+d2t

2+· · ·+dr−1tr−1 which represents

a polynomial in t of degree r − 1.

The 2nd-order Difference Equation with Complex Roots

Imagine that the 2nd-order equation α(L)y(t) = α0y(t) + α1y(t − 1) +α2y(t−2) = 0 is such that α(z) = 0 has complex roots λ = 1/µ and λ∗ = 1/µ∗.If λ, λ∗ are conjugate complex numbers, then so too are µ, µ∗. Therefore, letus write

(3.37)µ = γ + iδ = κ(cosω + i sinω) = κeiω,

µ∗ = γ − iδ = κ(cosω − i sinω) = κe−iω.

These will appear in a general solution of the difference equation of the form

(3.38) y(t) = cµt + c∗(µ∗)t.

41


0

2

4

6

8

0

−2

0 5 10 15 20 25 0 −5

p1 p2

Figure 2. The solution of the homogeneous difference equation (1 −1.69L+ 0.81L2)y(t) = 0 for the initial conditions y0 = 1 and y1 = 3.69.

The time lag of the phase displacement p1 and the duration of the cycle p2

are also indicated.

This represents a real-valued sequence; and, since a real term must equal itsown conjugate, it follows that c and c∗ must be conjugate numbers of the form

(3.39)c∗ = ρ(cos θ + i sin θ) = ρeiθ,

c = ρ(cos θ − i sin θ) = ρe−iθ.

Thus the general solution becomes

(3.40)

cµt + c∗(µ∗)t = ρe−iθ(κeiω)t + ρeiθ(κe−iω)t

= ρκt{ei(ωt−θ) + e−i(ωt−θ)

}= 2ρκt cos(ωt− θ).

To analyse the final expression, consider first the factor cos(ωt− θ). Thisis a displaced cosine wave. The value ω, which is a number of radians per unitperiod, is called the angular velocity or the angular frequency of the wave. Thevalue f = ω/2π is its frequency in cycles per unit period. The duration of onecycle, also called the period, is r = 2π/ω.

The term θ is called the phase displacement of the cosine wave, and itserves to shift the cosine function along the axis of t so that, in the absence ofdamping, the peak would occur at the value of t = θ/ω instead of at t = 0.

42


Next consider the term κt wherein κ =√

(γ2 + δ2) is the modulus of thecomplex roots. When κ has a value of less than unity, it becomes a dampingfactor which serves to attenuate the cosine wave as t increases. The dampingalso serves to shift the peaks of the cosine function slightly to the left.

Finally, the factor 2ρ affects the initial amplitude of the cosine wave whichis the value which it assumes when t = 0. Since ρ is just the modulus of thevalues c and c∗, this amplitude reflects the initial conditions. The phase angleθ is also a product of the initial conditions.

It is instructive to derive an expression for the second-order difference equa-tion which is in terms of the parameters of the trigonometrical or exponentialrepresentations of a pair of complex roots. Consider

(3.41)α(z) = α0(1− µz)(1− µ∗z)

= α0

{1− (µ+ µ∗)z + µµ∗z2

},

From (37) it follows that

(3.42) µ+ µ∗ = 2κ cosω and µµ∗ = κ2.

Therefore the polynomial operator which is entailed by the difference equationis

(3.43) α0 + α1L+ α2L2 = α0(1− 2κ cosω L+ κ2L2);

and it is usual to set α0 = 1. This representation indicates that a necessarycondition for the roots to be complex, which is not a sufficient condition, isthat α2/α0 > 0.

It is easy to ascertain by inspection whether or not the second-order dif-ference equation is stable. The condition that the roots of α(z) = 0 must lieoutside the unit circle, which is necessary and sufficient for stability, imposescertain restrictions on the coefficients of α(z) which can be checked easily.

We can reveal these conditions most readily by considering the auxiliarypolynomial ρ(z) = z2α(z−1) whose roots, which are the inverses of those ofα(z), must lie inside the unit circle. Let the roots of ρ(z), which might be realor complex, be denoted by µ1, µ2. Then we can write

(3.44)

ρ(z) = α0z2 + α1z + α2

= α0(z − µ1)(z − µ2)

= α0

{z2 − (µ1 + µ2)z + µ1µ2

},

where is is assumed that α0 > 0. This indicates that α2/α0 = µ1µ2. Thereforethe conditions |µ1|, |µ2| < 1 imply that

(3.45) −α0 < α2 < α0.

43


If the roots are complex conjugate numbers µ, µ∗ = γ ± iδ, then this conditionwill ensure that µ∗µ = α2/α0 < 1, which is the condition that they are withinthe unit circle.

Now consider the fact that, if α0 > 0, then the function ρ(z) will have aminimum value over the real line which is greater than zero if the roots arecomplex and no greater than zero if they are real. If the roots are real, thenthey will be found in the interval (−1, 1) if and only if

(3.46)ρ(−1) = α0 − α1 + α2 > 0 and

ρ(1) = α0 + α1 + α2 > 0.

If the roots are complex then these conditions are bound to be satisfied.From these arguments, it follows that the conditions under (45) and (46)

in combination are necessary and sufficient to ensure that the roots of ρ(z) = 0are within the unit circle and that the roots of α(z) = 0 are outside.

State-Space Models

An nth-order difference equation in a single variable can be transformedinto a first-order system in n variables which are the elements of a so-calledstate vector.

There is a wide variety of alternative forms which can be assumed bya first-order vector difference equation corresponding to the nth-order scalarequation. However, certain of these are described as canonical forms by virtueof special structures in the matrix.

In demonstrating one of the more common canonical forms, let us consideragain the nth-order difference equation of (32), in reference to which we maydefine the following variables:

(3.47)

ξ1(t) = x(t),

ξ2(t) = ξ1(t− 1) = x(t− 1),...

ξn(t) = ξn−1(t− 1) = x(t− n+ 1).

On the basis of these definitions, a first-order vector equation may be con-structed in the form of

(3.48)

ξ1(t)ξ2(t)

...ξn(t)

=

−α1 . . . −αn−1 −αn

1 . . . 0 0...

. . ....

...0 . . . 1 0

ξ1(t− 1)ξ2(t− 1)

...ξn(t− 1)

+

10...0

ε(t).44


The matrix in this structure is sometimes described as the companion form.Here it is manifest, in view of the definitions under (47), that the leadingequation of the system, which is

(3.49) ξ1(t) = −α1ξ1(t− 1) + · · ·+ αnξn(t− 1) + ε(t),

is precisely the equation under (32).

Example. An example of a system which is not in a canonical form is providedby the following matrix equation:

(3.50)

[y(t)z(t)

]= κ


] [y(t− 1)z(t− 1)

]+

[υ(t)ζ(t)

].

With the use of the lag operator, the equation can also be written as

(3.51)

[1− κ cosωL κ sinωL−κ sinωL 1− κ cosωL

] [y(t)z(t)

]=

[υ(t)ζ(t)

].

On premultiplying the equation by the inverse of the matrix on the LHS, weget(3.52)[

y(t)z(t)

]=

1

1− 2κ cosωL+ κ2L2

[1− κ cosωL −κ sinωLκ sinωL 1− κ cosωL

] [υ(t)ζ(t)

].

A special case arises when

(3.53)

[υ(t)ζ(t)

]=

[− sinωcosω

]η(t),

where η(t) is a white-noise sequence. Then the equation becomes

(3.54)

[y(t)z(t)

]=

1

1− 2κ cosωL+ κ2L2

[− sinωcosω

]η(t).

On defining ε(t) = − sinωη(t) we may write the first of these equations as

(3.55) (1− 2κ cosωL+ κ2L2)y(t) = ε(t).

This is just a second-order difference equation with a white-noise forcing func-tion; and, by virtue of the inclusion of the damping factor κ ∈ [0, 1), it repre-sents a generalisation of the equation to be found under (2.24).

45


Transfer Functions

Consider again the simple dynamic model of equation (5):

(3.56) y(t) = φy(t− 1) + x(t)β + ε(t).

With the use of the lag operator, this can be rewritten as

(3.57) (1− φL)y(t) = βx(t) + ε(t)

or, equivalently, as

(3.58) y(t) =β

1− φLx(t) +1

1− φLε(t).

The latter is the so-called rational transfer-function form of the equation. Theoperator L within the transfer functions or filters can be replaced by a complexnumber z. Then the transfer function which is associated with the signal x(t)becomes

(3.59)β

1− φz = β{

1 + φz + φ2z2 + · · ·},

where the RHS comes from a familiar power-series expansion.The sequence {β, βφ, βφ2, . . .} of the coefficients of the expansion consti-

tutes the impulse response of the transfer function. That is to to say, if weimagine that, on the input side, the signal is a unit-impulse sequence of theform

(3.60) x(t) = {. . . , 0, 1, 0, 0, . . .},

which has zero values at all but one instant, then its mapping through thetransfer function would result in an output sequence of

(3.61) r(t) = {. . . , 0, β, βφ, βφ2, . . .}.

Another important concept is the step response of the filter. We mayimagine that the input sequence is zero-valued up to a point in time when itassumes a constant unit value:

(3.62) x(t) = {. . . , 0, 1, 1, 1, . . .}.

The mapping of this sequence through the transfer function would result in anoutput sequence of

(3.63) s(t) = {. . . , 0, β, β + βφ, β + βφ+ βφ2, . . .}

46


whose elements, from the point when the step occurs in x(t), are simply thepartial sums of the impulse-response sequence. This sequence of partial sums{β, β + βφ, β + βφ + βφ2, . . .} is described as the step response. Given that|φ| < 1, the step response converges to a value

(3.64) γ =β

1− φ

which is described as the steady-state gain or the long-term multiplier of thetransfer function.

These various concepts apply to models of any order. Consider the equa-tion

(3.65) α(L)y(t) = β(L)x(t) + ε(t),

where

(3.66)

α(L) = 1 + α1L+ · · ·+ αpLp

= 1− φ1L− · · · − φpLp,

β(L) = β0 + β1L+ · · ·+ βkLk

are polynomials of the lag operator. The transfer-function form of the modelis simply

(3.67) y(t) =β(L)

α(L)x(t) +

1

α(L)ε(t),

The rational function associated with x(t) has a series expansion

(3.68)

β(z)

α(z)= ω(z)

={ω0 + ω1z + ω2z

2 + · · ·}

;

and the sequence of the coefficients of this expansion constitutes the impulse-response function. The partial sums of the coefficients constitute the step-response function. The gain of the transfer function is defined by

(3.69) γ =β(1)

α(1)=β0 + β1 + · · ·+ βk1 + α1 + · · ·+ αp

.

The method of finding the coefficients of the series expansion of the transferfunction in the general case can be illustrated by the second-order case:

(3.70)β0 + β1z

1− φ1z − φ2z2={ω0 + ω1z + ω2z

2 + · · ·}.

47


We rewrite this equation as

(3.71) β0 + β1z ={

1− φ1z − φ2z2}{ω0 + ω1z + ω2z

2 + · · ·}.

Then, by performing the multiplication on the RHS, and by equating the co-efficients of the same powers of z on the two sides of the equation, we findthat

(3.72)

β0 = ω0,

β1 = ω1 − φ1ω0,

0 = ω2 − φ1ω1 − φ2ω0,...

0 = ωn − φ1ωn−1 − φ2ωn−2,

ω0 = β0,

ω1 = β1 + φ1ω0,

ω2 = φ1ω1 + φ2ω0,...

ωn = φ1ωn−1 + φ2ωn−2.

The necessary and sufficient condition for the convergence of the sequence{ωi} is that the roots of the primary polynomial equation 1− φ1z − φ2z

2 = 0should lie outside the unit circle or, equivalently, that the roots of the auxiliaryequation z2−φ1z−φ2 = 0—which are the inverses of the former roots—shouldlie inside the unit circle. If the roots of these equations are real, then thesequence will converge monotonically to zero whereas, if the roots are complex-valued, then the sequence will converge in the manner of a damped sinusoid.

It is clear that the equation

(3.73) ω(n) = φ1ω(n− 1) + φ2ω(n− 2),

which serves to generate the elements of the impulse response, is nothing buta second-order homogeneous difference equation. In fact, Figure 2, which hasbeen presented as the solution to a homogeneous difference equation, representsthe impulse response of the transfer function (1 + 2L)/(1− 1.69L+ 0.81L2).

In the light of this result, it is apparent that the coefficients of the denomi-nator polynomial 1−φ1z−φ2z

2 serve to determine the period and the dampingfactor of a complex impulse response. The coefficients in the numerator poly-nomial β0 + β1z serve to determine the initial amplitude of the response andits phase lag. It seems that all four coefficients must be present if a second-order transfer function is to have complete flexibility in modelling a dynamicresponse.

The Frequency Response

In many applications within forecasting and time-series analysis, it is ofinterest to consider the response of a transfer function to a signal which is asimple sinusoid. As we have indicated in a previous lecture, it is possible

48


0

10

20

30

40

50

0 π/2−π/2 π−π

Figure 3.The gain of the transfer function (1 + 2L2)/(1− 1.69L+ 0.81L2).

0 π/2 π−π/2−π

π

−π

Figure 4.The phase diagram of the transfer function (1 + 2L2)/(1− 1.69L+ 0.81L2).

49


to represent a finite sequence as a sum of sine and cosine functions whosefrequencies are integer multiples of a fundamental frequency. More generally, itis possible, as we shall see later, to represent an arbitrary stationary stochasticprocess as a combination of an infinite number of sine and cosine functionswhose frequencies range continuously in the interval [0, π]. It follows that theeffect of a transfer function upon stationary signals can be characterised interms of its effect upon the sinusoidal functions.

Consider therefore the consequences of mapping the signal x(t) = cos(ωt)through the transfer function γ(L) = γ0 + γ1L+ · · ·+ γgL

g. The output is

(3.74)

y(t) = γ(L) cos(ωt)

=

g∑j=0

γj cos(ω[t− j]

).

The trigonometrical identity cos(A−B) = cosA cosB + sinA sinB enables usto write this as

(3.75)y(t) =

{∑j

γj cos(ωj)}

cos(ωt) +{∑

j

γj sin(ωj)}

sin(ωt)

= α cos(ωt) + β sin(ωt) = ρ cos(ωt− θ).

Here we have defined

(3.76)

α =

g∑j=0

γj cos(ωj), β =

g∑j=0

γj sin(ωj),

ρ =√α2 + β2 and θ = tan−1

(βα

).

It can be seen from (75) that the effect of the filter upon the signal istwofold. First there is a gain effect whereby the amplitude of the sinusoid hasbeen increased or diminished by a factor of ρ. Also there is a phase effectwhereby the peak of the sinusoid is displaced by a time delay of θ/ω periods.Figures 3 and 4 represent the two effects of a simple rational transfer functionon the set of sinusoids whose frequencies range from 0 to π.

50

LECTURE 4

Time-Series Analysis inthe Frequency Domain

A sequence is a function mapping from a set of integers, described as theindex set, onto the real line or into a subset thereof. A time series is a sequencewhose index corresponds to consecutive dates separated by a unit time interval.

In the statistical analysis of time series, the elements of the sequence areregarded as a set of random variables. Usually, no notational distinction ismade between these random variables and their realised values. It is importantnevertheless to bear the distinction in mind.

In order to analyse a statistical time series, it must be assumed that thestructure of the statistical or stochastic process which generates the observa-tions is essentially invariant through time. The conventional assumptions aresummarised in the condition of stationarity. In its strong form, the conditionrequires that any two segments of equal length which are extracted from thetime series must have identical multivariate probability density functions. Thecondition of weak stationarity requires only that the elements of the time seriesshould have a common finite expected value and that the autocovariance of twoelements should depend only on their temporal separation.

A fundamental process, from which many other stationary processes maybe derived, is the so-called white-noise process which consists of a sequence ofuncorrelated random variables, each with a zero mean and the same finite vari-ance. By passing white noise through a linear filter, a sequence whose elementsare serially correlated can be generated. In fact, virtually every stationarystochastic process may be depicted as the product of a filtering operation ap-plied to white noise. This result follows from the Cramer–Wold Theorem whichwill be presented after we have introduced the concepts underlying the spectralrepresentation of a time series.

The spectral representation is rooted in the basic notion of Fourier analysiswhich is that well-behaved functions can be approximated over a finite inter-val, to any degree of accuracy, by a weighted combination of sine and cosinefunctions whose harmonically rising frequencies are integral multiples of a fun-damental frequency. Such linear combinations are described as Fourier sumsor Fourier series. Of course, the notion applies to sequences as well; for any

51


number of well-behaved functions may be interpolated through the coordinatesof a finite sequence.

We shall approach the Fourier analysis of stochastic processes via the ex-act Fourier representation of a finite sequence. This is extended to provide arepresentation of an infinite sequence in terms of an infinity of trigonometri-cal functions whose frequencies range continuously in the interval [0, π]. Thetrigonometrical functions and their weighting functions are gathered under aFourier–Stieltjes integral. It is remarkable that, whereas a Fourier sum servesonly to define a strictly periodic function, a Fourier integral suffices to representan aperiodic time series generated by a stationary stochastic process.

The Fourier integral is also used to represent the underlying stochasticprocess. This is achieved by describing the stochastic processes which generatethe weighting functions. There are two such weighting processes, associatedrespectively with the sine and cosine functions; and their common variance,which is a function f(ω), ω ∈ [0, π], is the so-called spectral density function.

The relationship between the spectral density function and the sequenceof autocovariances, which is summarised in the Wiener–Khintchine theorem,provides a link between the time-domain and the frequency-domain analyses.The sequence of autocovariances may be obtained from the Fourier transformof the spectral density function and the spectral density function is, conversely,a Fourier transform of the autocovariances.

Stationarity

Consider two vectors of n+ 1 consecutive elements from the process y(t):

(4.1) [yt, yt+1, . . . , yt+n] and [ys, ys+1, . . . , ys+n]

Then y(t) = {yt; t = 0,±1,±2, . . .} is strictly stationary if the joint probabilitydensity functions of the two vectors are the same for any values of t and sregardless of the size of n. On the assumption that the first and second-ordermoments of the distribution are finite, the condition of stationarity impliesthat all the elements of y(t) have the same expected value and that the covari-ance between any pair of elements of the sequences is a function only of theirtemporal separation. Thus,

(4.2) E(yt) = µ and C(yt, ys) = γ|t−s|.

On their own, the conditions of (2) constitute the conditions of weak station-arity.

A normal process is completely characterised by its mean and its autoco-variances. Therefore, a normal process y(t) which satisfies the conditions forweak stationarity is also stationary in the strict sense.

52

D.S.G. POLLOCK : THE FREQUENCY DOMAIN

The Autocovariance Function

The covariance between two elements yt and ys of a process y(t) which areseparated by τ − |t− s| intervals of time, is known as the autocovariance at lagτ and is denoted by γτ . The autocorrelation at lag τ , denoted by ρτ , is definedby

(4.3) ρτ =γτγ0,

where γ0 is the variance of the process y(t).The stationarity conditions imply that the autocovariances of y(t) satisfy

the equality

(4.4) γτ = γ−τ

for all values of τ .The autocovariance matrix of a stationary process corresponding to the n

elements y0, y1, . . . , yn−1 is given by

(4.5) Γ =

γ0 γ1 γ2 . . . γn−1

γ1 γ0 γ1 . . . γn−2

γ2 γ1 γ0 . . . γn−3

......

.... . .

...γn−1 γn−2 γn−3 . . . γ0

.The sequences {γτ} and {ρτ} are described as the autocovariance and autocor-relation functions respectively.

The Filtering of White Noise

A white-noise process is a sequence ε(t) of uncorrelated random variableswith mean zero and common variance σ2

ε . Thus

(4.6)

E(εt) = 0, for all t

E(εtεs) =

{σ2ε , if t = s;

0, if t 6= s.

By a process of linear filtering, a variety of time series may be constructedwhose elements display complex interdependencies. A finite linear filter, alsocalled a moving-average operator, is a polynomial in the lag operator of theform µ(L) = µ0 +µ1L+ · · ·+µqL

q. The effect of this filter on ε(t) is describedby the equation

(4.7)

y(t) = µ(L)ε(t)

= µ0ε(t) + µ1ε(t− 1) + µ2ε(t− 2) + · · ·+ µqε(t− q)

=

q∑i=0

µiε(t− i).

53


The operator µ(L) is also be described as the transfer function which maps theinput sequence ε(t) into the output sequence y(t).

An operator µ(L) = {µ0 + µ1L + µ2L2 + · · ·} with an indefinite number

of terms in rising powers of L may also be considered. However, for this to bepractical, the coefficients {µ0, µ1, µ2, . . .} must be functions of a limited numberof fundamental parameters. In addition, it is required that

(4.8)∑i

|µi| <∞.

Given the value of σ2ε = V {ε(t)}, the autocovariances of the filtered se-

quence y(t) = µ(L)ε(t) may be determined by evaluating the expression

(4.9)

γτ = E(ytyt−τ )

= E

(∑i

µiεt−i∑j

µjεt−τ−j

)=∑i

∑j

µiµjE(εt−iεt−τ−j).

From equation (6), it follows that

(4.10) γτ = σ2ε

∑j

µjµj+τ ;

and so the variance of the filtered sequence is

(4.11) γ0 = σ2ε

∑j

µ2j .

The condition under equation (8) guarantees that these quantities are finite, asis required by the condition of stationarity.

The z-transform

In the subsequent analysis, it will prove helpful to present the results in thenotation of the z-transform. The z-transform of the infinite sequence y(t) ={yt; t = 0,±1,±2, . . .} is defined by

(4.12) y(z) =∞∑

τ=−∞ytz

t.

Here z is a complex number which may be placed on the perimeter of the unitcircle provided that the series converges. Thus z = e−iω with ω ∈ [0, 2π]

54


If y(t) = µ0ε(t) + µ1ε(t − 1) + · · · + µqzqε(t − q) = µ(L)ε(t) is a moving-

average process, then the z-transform of the sequence of moving-average coeffi-cients is the polynomial µ(z) = µ0 + µ1z+ · · ·+ µqz

q which has the same formas the operator µ(L).

The z-transform of a sequence of autocovariances is called the autocovari-ance generating function. For the moving-average process, this is given by

(4.13)

γ(z) = σ2εµ(z)µ(z−1)

= σ2ε

∑i

µizi∑j

µjz−j

= σ2ε

∑i

∑j

µiµjzi−j

=∑τ

{σ2ε

∑j

µjµj+τ

}zτ ; τ = i− j

=∞∑

τ=−∞γτz

τ .

The final equality is by virtue of equation (10).

The Fourier Representation of a Sequence

According to the basic result of Fourier analysis, it is always possible toapproximate an arbitrary analytic function defined over a finite interval of thereal line, to any desired degree of accuracy, by a weighted sum of sine andcosine functions of harmonically increasing frequencies.

Similar results apply in the case of sequences, which may be regarded asfunctions mapping from the set of integers onto the real line. For a sample ofT observations y0, . . . , yT−1, it is possible to devise an expression in the form

(4.14) yt =

n∑j=0


},

wherein ωj = 2πj/T is a multiple of the fundamental frequency ω1 = 2π/T .Thus, the elements of a finite sequence can be expressed exactly in terms ofsines and cosines. This expression is called the Fourier decomposition of yt andthe set of coefficients {αj , βj ; j = 0, 1, . . . , n} are called the Fourier coefficients.

When T is even, we have n = T/2; and it follows that

(4.15)

sin(ω0t) = sin(0) = 0,

cos(ω0t) = cos(0) = 1,

sin(ωnt) = sin(πt) = 0,

cos(ωnt) = cos(πt) = (−1)t.

55


Therefore, equation (14) becomes

(4.16) yt = α0 +

n−1∑j=1


}+ αn(−1)t.

When T is odd, we have n = (T − 1)/2; and then equation (14) becomes

(4.17) yt = α0 +

n∑j=1


}.

In both cases, there are T nonzero coefficients amongst the set{αj , βj ; j = 0, 1, . . . , n}; and the mapping from the sample values to the co-efficients constitutes a one-to-one invertible transformation.

In equation (16), the frequencies of the trigonometric functions range fromω1 = 2π/T to ωn = π; whereas, in equation (17), they range from ω1 = 2π/Tto ωn = π(T − 1)/T . The frequency π is the so-called Nyquist frequency.

Although the process generating the data may contain components of fre-quencies higher than the Nyquist frequency, these will not be detected whenit is sampled regularly at unit intervals of time. In fact, the effects on theprocess of components with frequencies in excess of the Nyquist value will beconfounded with those whose frequencies fall below it.

To demonstrate this, consider the case where the process contains a com-ponent which is a pure cosine wave of unit amplitude and zero phase whosefrequency ω lies in the interval π < ω < 2π. Let ω∗ = 2π − ω. Then

(4.18)

cos(ωt) = cos{

(2π − ω∗)t}

= cos(2π) cos(ω∗t) + sin(2π) sin(ω∗t)

= cos(ω∗t);

which indicates that ω and ω∗ are observationally indistinguishable. Here,ω∗ < π is described as the alias of ω > π.

The Spectral Representation of a Stationary Process

By allowing the value of n in the expression (14) to tend to infinity, it ispossible to express a sequence of indefinite length in terms of a sum of sine andcosine functions. However, in the limit as n → ∞, the coefficients αj , βj tendto vanish; and therefore an alternative representation in terms of differentialsis called for.

By writing αj = dA(ωj), βj = dB(ωj) where A(ω), B(ω) are step functionswith discontinuities at the points {ωj ; j = 0, . . . , n}, the expression (14) can berendered as

(4.19) yt =∑j

{cos(ωjt)dA(ωj) + sin(ωjt)dB(ωj)

}.

56


0.0

0.2

0.4

0.6

0.0

−0.2

−0.4

0 25 50 75 100 125

0.000

0.025

0.050

0.075

0.100

0.125

0 π/4 π/2 3π/4 π

Figure 1. The graph of 134 observations on the monthly purchase of

clothing after a logarithmic transformation and the removal of a linear trend

together with the corresponding periodogram.

57


In the limit, as n → ∞, the summation is replaced by an integral to give theexpression

(4.20) y(t) =

∫ π

0

{cos(ωt)dA(ω) + sin(ωt)dB(ω)

}.

Here, cos(ωt) and sin(ωt), and therefore y(t), may be regarded as infinite se-quences defined over the entire set of positive and negative integers.

Since A(ω) and B(ω) are discontinuous functions for which no derivativesexist, one must avoid using α(ω)dω and β(ω)dω in place of dA(ω) and dB(ω).Moreover, the integral in equation (20) is a Fourier–Stieltjes integral.

In order to derive a statistical theory for the process that generates y(t),one must make some assumptions concerning the functions A(ω) and B(ω).So far, the sequence y(t) has been interpreted as a realisation of a stochasticprocess. If y(t) is regarded as the stochastic process itself, then the functionsA(ω), B(ω) must, likewise, be regarded as stochastic processes defined overthe interval [0, π]. A single realisation of these processes now corresponds to asingle realisation of the process y(t).

The first assumption to be made is that the functions A(ω) and B(ω)represent a pair of stochastic processes of zero mean which are indexed on thecontinuous parameter ω. Thus

(4.21) E{dA(ω)

}= E

{dB(ω)

}= 0.

The second and third assumptions are that the two processes are mutu-ally uncorrelated and that non-overlapping increments within each process areuncorrelated. Thus

(4.22)

E{dA(ω)dB(λ)

}= 0 for all ω, λ,

E{dA(ω)dA(λ)

}= 0 if ω 6= λ,

E{dB(ω)dB(λ)

}= 0 if ω 6= λ.

The final assumption is that the variance of the increments is given by

(4.23)V{dA(ω)

}= V

{dB(ω)

}= 2dF (ω)

= 2f(ω)dω.

We can see that, unlike A(ω) and B(ω), F (ω) is a continuous differentiablefunction. The function F (ω) and its derivative f(ω) are the spectral distribu-tion function and the spectral density function, respectively.

In order to express equation (20) in terms of complex exponentials, wemay define a pair of conjugate complex stochastic processes:

(4.24)dZ(ω) =

1

2

{dA(ω)− idB(ω)

},

dZ∗(ω) =1

2

{dA(ω) + idB(ω)

}.

58


Also, we may extend the domain of the functions A(ω), B(ω) from [0, π] to[−π, π] by regarding A(ω) as an even function such that A(−ω) = A(ω) and byregarding B(ω) as an odd function such that B(−ω) = −B(ω). Then we have

(4.25) dZ∗(ω) = dZ(−ω).

From conditions under (22), it follows that

(4.26)E{dZ(ω)dZ∗(λ)

}= 0 if ω 6= λ,

E{dZ(ω)dZ∗(ω)} = f(ω)dω.

These results may be used to reexpress equation (20) as

(4.27)

y(t) =

∫ π

0

{(eiωt + e−iωt)

2dA(ω)− i (e

iωt − e−iωt)2

dB(ω)

}=

∫ π

0

{eiωt{dA(ω)− idB(ω)}

2+ e−iωt

{dA(ω) + idB(ω)}2

}=

∫ π

0

{eiωtdZ(ω) + e−iωtdZ∗(ω)

}.

When the integral is extended over the range [−π, π], this becomes

(4.28) y(t) =

∫ π

−πeiωtdZ(ω).

This is commonly described as the spectral representation of the process y(t).

The Autocovariances and the Spectral Density Function

The sequence of the autocovariances of the process y(t) may be expressedin terms of the spectrum of the process. From equation (28), it follows thatthe autocovariance yt at lag τ = t− k is given by

(4.29)

γτ = C(yt, yk) = E

{∫ω

eiωtdZ(ω)

∫λ

e−iλkdZ(−λ)

}=

∫ω

∫λ

eiωte−iλkE{dZ(ω)dZ∗(λ)}

=

∫ω

eiωτE{dZ(ω)dZ∗(ω)}

=

∫ω

eiωτf(ω)dω.

59


0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

−0.75

0 5 10 15 20 25

0

10

20

30

40

0 π/4 π/2 3π/4 π

Figure 2. The theoretical autocorrelation function of the ARMA(2, 2)

process (1 − 1.344L + 0.902L2)y(t) = (1 − 1.691L + 0.810L2)ε(t) and

(below) the corresponding spectral density function.

60


Here the final equalities are derived by using the results (25) and (26). Thisequation indicates that the Fourier transform of the spectrum is the autoco-variance function.

The inverse mapping from the autocovariances to the spectrum is given by

(4.30)

f(ω) =1

2π

∞∑τ=−∞

γτe−iωτ

=1

2π

{γ0 + 2

∞∑τ=1

γτ cos(ωτ)}.

This function is directly comparable to the periodogram of a data sequencewhich is defined under (2.41). However, the periodogram has T empirical auto-covariances c0, . . . , cT−1 in place of an indefinite number of theoretical autoco-variances. Also, it differs from the spectrum by a scalar factor of 4π. In manytexts, equation (30) serves as the primary definition of the spectrum.

To demonstrate the relationship which exists between equations (29) and(30), we may substitute the latter into the former to give

(4.31)

γτ =

∫ π

−πeiωτ

{ 1

2π

∞∑τ=−∞

γτe−iωτ

}dω

=1

2π

∞∑κ=−∞

γκ

∫ π

−πeiω(τ−κ)dω.

From the fact that

(4.32)

∫ π

−πeiω(τ−κ)dω =

{2π, if κ = τ ;

0, if κ 6= τ ,

it can be seen that the RHS of the equation reduces to γτ . This serves to showthat equations (29) and (30) do indeed represent a Fourier transform and itsinverse.

The essential interpretation of the spectral density function is indicated bythe equation

(4.33) γ0 =

∫ω

f(ω)dω,

which comes from setting τ = 0 in equation (29). This equation shows how thevariance or ‘power’ of y(t), which is γ0, is attributed to the cyclical componentsof which the process is composed.

61


It is easy to see that a flat spectrum corresponds to the autocovariancefunction which characterises a white-noise process ε(t). Let fε = fε(ω) be theflat spectrum. Then, from equation (30), it follows that

(4.34)γ0 =

∫ π

−πfε(ω)dω

= 2πfε,

and, from equation (29), it follows that

(4.35)

γτ =

∫ π

−πfε(ω)eiωτdω

= fε

∫ π

−πeiωτdω

= 0.

These are the same as the conditions under (6) which have served to define awhite-noise process. When the variance is denoted by σ2

ε , the expression forthe spectrum of the white-noise process becomes

(4.36) fε(ω) =σ2ε

2π.

Canonical Factorisation of the Spectral Density Function

Let y(t) be a stationary stochastic process whose spectrum is fy(ω). Sincefy(ω) ≥ 0, it is always possible to find a complex function µ(ω) such that

(4.37) fy(ω) =1

2πµ(ω)µ∗(ω).

For a wide class of stochastic processes, the function µ(ω) may be constructedin such a way that it can be expanded as a one-sided Fourier series:

(4.38) µ(ω) =∞∑j=0

µje−iωj .

On defining

(4.39) dZε(ω) =dZy(ω)

µ(ω),

62


the spectral representation of the process y(t) given in equation (28), may berewritten as

(4.40) y(t) =

∫ω

eiωtµ(ω)dZε(ω).

Expanding the expression of µ(ω) and interchanging the order of integra-tion and summation gives

(4.41)

y(t) =

∫ω

eiωt(∑

j

µje−iωj

)dZε(ω)

=∑j

µj

{∫ω

eiω(t−j)dZε(ω)

}=∑j

µjε(t− j),

where we have defined

(4.42) ε(t) =

∫ω

eiωtdZε(ω).

The spectrum of ε(t) is given by

(4.43)

E{dZε(ω)dZ∗ε (ω)} = E

{dZy(ω)dZ∗y (ω)

µ(ω)µ∗(ω)

}=

fy(ω)

µ(ω)µ∗(ω)

=1

2π.

Hence ε(t) is identified as a white-noise process with unit variance. Thereforeequation (38) represents a moving-average process; and what our analysis im-plies is that virtually every stationary stochastic process can be represented inthis way.

The Frequency-Domain Analysis of Filtering

It is a straightforward matter to derive the spectrum of a process y(t) =µ(L)x(t) which is formed by mapping the process x(t) through a linear filter.

Taking the spectral representation of the process x(t) to be

(4.44) x(t) =

∫ω

eiωtdZx(ω),

63


we have

(4.45)

y(t) =∑j

µjx(t− j)

=∑j

µj

{∫ω

eiω(t−j)dZx(ω)

}

=

∫ω

eiωt(∑

j

µje−iωj

)dZx(ω).

On writing∑µje−iωj = µ(ω), this becomes

(4.46)

y(t) =

∫ω

eiωtµ(ω)dZx(ω)

=

∫ω

eiωtdZy(ω).

It follows that the spectral density function fy(ω) of the filtered process y(t) isgiven by

(4.47)

fy(ω)dω = E{dZy(ω)dZ∗y (ω)}= µ(ω)µ∗(ω)E{dZx(ω)dZ∗x(ω)}= |µ(ω)|2fx(ω)dω.

In the case of the process defined in equation (7), where y(t) is obtained byfiltering a white-noise sequence, the result is specialised to give

(4.48)

fy(ω) = |µ(ω)|2fε(ω)

=σ2ε

2π|µ(ω)|2.

Let µ(z) =∑µjz

j denote the z-transform of the sequence {µj}. Then

(4.49)

|µ(z)|2 = µ(z)µ(z−1)

=∑τ

∑j

µjµj+τzτ .

It follows that, when z = e−iω, equation (48) can be written as

(4.50)

fy(ω) =σ2ε

2πµ(z)µ(z−1)

=1

2π

∑τ

{σ2ε

∑j

µjµj+τ

}zτ .

64


But, according to equation (10), γτ = σ2ε

∑j µjµj+τ is the autocovariance of

lag τ of the process y(t). Therefore, the function fy(ω) can be written as

(4.51)

fy(ω) =1

2π

∞∑τ=−∞

e−iωτγτ

=1

2π

{γ0 + 2

∞∑τ=1

γτ cos(ωτ)

},

which indicates that the spectral density function is the Fourier transform of theautocovariance function of the filtered sequence. This is known as the Wiener–Khintchine theorem. The importance of this theorem is that it provides a linkbetween the time domain and the frequency domain.

The Gain and Phase

The complex-valued function µ(ω), which is entailed in the process of linearfiltering, can be written as

(4.52) µ(ω) = |µ(ω)|e−iθ(ω).

where

(4.53)

|µ(ω)|2 =

{ ∞∑j=0

µj cos(ωj)

}2

+

{ ∞∑j=0

µj sin(ωj)

}2

θ(ω) = arctan

{∑µj sin(ωj)∑µj cos(ωj)

}.

The function |µ(ω)|, which is described as the gain of the filter, indicatesthe extent to which the amplitude of the cyclical components of which x(t) iscomposed are altered in the process of filtering.

The function θ(ω), which is described as the phase displacement and whichgives a measure in radians, indicates the extent to which the cyclical compo-nents are displaced along the time axis.

The substitution of expression (52) in equation (46) gives

(4.54) y(t) =

∫ π

−πei{ωt−θ(ω)}|µ(ω)|dZx(ω).

The importance of this equation is that it summarises the two effects of thefilter.

65

LECTURE 5

Linear Stochastic Models

Autcovariances of a Stationary Process

A temporal stochastic process is simply a sequence of random variablesindexed by a time subscript. Such a process can be denoted by x(t). Theelement of the sequence at the point t = τ is xτ = x(τ).

Let {xτ+1, xτ+2, . . . , xτ+n} denote n consecutive elements of the sequence.Then the process is said to be strictly stationary if the joint probability distri-bution of the elements does not depend on τ regardless of the size of n. Thismeans that any two segments of the sequence of equal length have identicalprobability density functions. In consequence, the decision on where to placethe time origin is arbitrary; and the argument τ can be omitted. Some furtherimplications of stationarity are that

(5.1) E(xt) = µ <∞ for all t and C(xτ+t, xτ+s) = γ|t−s|.

The latter condition means that the covariance of any two elements dependsonly on their temporal separation |t − s|. Notice that, if the elements of thesequence are normally distributed, then the two conditions are sufficient toestablish strict stationarity. On their own, they constitute the conditions ofweak or 2nd-order stationarity.

The condition on the covariances implies that the dispersion matrix of thevector [x1, x2, . . . , xn] is a bisymmetric Laurent matrix of the form

(5.2) Γ =

γ0 γ1 γ2 . . . γn−1

γ1 γ0 γ1 . . . γn−2

γ2 γ1 γ0 . . . γn−3

......

.... . .

...γn−1 γn−2 γn−3 . . . γ0

,

wherein the generic element in the (i, j)th position is γ|i−j| = C(xi, xj). Giventhat a sequence of observations of a time series represents only a segment ofa single realisation of a stochastic process, one might imagine that there islittle chance of making valid inferences about the parameters of the process.

66

D.S.G. POLLOCK : LINEAR STOCHASTIC MODELS

However, provided that the process x(t) is stationary and provided that thestatistical dependencies between widely separated elements of the sequence areweak, it is possible to estimate consistently those parameters of the processwhich express the dependence of proximate elements of the sequence. If oneis prepared to make sufficiently strong assumptions about the nature of theprocess, then a knowledge of such parameters may be all that is needed for acomplete characterisation of the process.

Moving-Average Processes

The qth-order moving average process, or MA(q) process, is defined by theequation

(5.3) y(t) = µ0ε(t) + µ1ε(t− 1) + · · ·+ µqε(t− q),

where ε(t), which has E{ε(t)} = 0, is a white-noise process consisting of asequence of independently and identically distributed random variables withzero expectations. The equation is normalised either by setting µ0 = 1 or bysetting V {ε(t)} = σ2

ε = 1. The equation can be written in summary notationas y(t) = µ(L)ε(t), where µ(L) = µ0 + µ1L+ · · ·+ µqL

q is a polynomial in thelag operator.

A moving-average process is clearly stationary since any two elementsyt and ys represent the same function of the vectors [εt, εt−1, . . . , εt−q] and[εs, εs−1, . . . , εs−q] which are identically distributed. In addition to the condi-tion of stationarity, it is usually required that a moving-average process shouldbe invertible such that it can be expressed in the form of µ−1(L)y(t) = ε(t)where the LHS embodies a convergent sum of past values of y(t). This is aninfinite-order autoregressive representation of the process. The representationis available only if all the roots of the equation µ(z) = µ0 +µ1z+ · · ·+µqz

q = 0lie outside the unit circle. This conclusion follows from our discussion of partialfractions.

As an example, let us consider the first-order moving-average process whichis defined by

(5.4) y(t) = ε(t)− θε(t− 1) = (1− θL)ε(t).

Provided that |θ| < 1, this can be written in autoregressive form as

(5.5)ε(t) = (1− θL)−1y(t)

={y(t) + θy(t− 1) + θ2y(t− 2) + · · ·

}.

Imagine that |θ| > 1 instead. Then, to obtain a convergent series, we have towrite

(5.6)y(t+ 1) = ε(t+ 1)− θε(t)

= −θ(1− L−1/θ)ε(t),

67


where L−1ε(t) = ε(t+ 1). This gives

(5.7)ε(t) = −θ−1(1− L−1/θ)−1y(t+ 1)

= −θ−1{y(t+ 1)/θ + y(t+ 2)/θ2 + y(t− 3)/θ3 + · · ·

}.

Normally, an expression such as this, which embodies future values of y(t),would have no reasonable meaning.

It is straightforward to generate the sequence of autocovariances from aknowledge of the parameters of the moving-average process and of the varianceof the white-noise process. Consider

(5.8)


= E{∑

i

µiεt−i∑j

µjεt−τ−j}

=∑i

∑j

µiµjE(εt−iεt−τ−j).

Since ε(t) is a sequence of independently and identically distributed randomvariables with zero expectations, it follows that

(5.9) E(εt−iεt−τ−j) =

{0, if i 6= τ + j;

σ2ε , if i = τ + j.

Therefore

(5.10) γτ = σ2ε

∑j

µjµj+τ .

Now let τ = 0, 1, . . . , q. This gives

(5.11)

γ0 = σ2ε(µ2

0 + µ21 + · · ·+ µ2

q),

γ1 = σ2ε(µ0µ1 + µ1µ2 + · · ·+ µq−1µq),

...

γq = σ2εµ0µq.

Also, γτ = 0 for all τ > q.The first-order moving-average process y(t) = ε(t) − θε(t − 1) has the

following autocovariances:

(5.12)

γ0 = σ2ε(1 + θ2),

γ1 = −σ2εθ,

γτ = 0 if τ > 1.

68


Thus, for a vector y = [y1, y2, . . . , yT ]′ of T consecutive elements from a first-order moving-average process, the dispersion matrix is

(5.13) D(y) = σ2ε

1 + θ2 −θ 0 . . . 0−θ 1 + θ2 −θ . . . 00 −θ 1 + θ2 . . . 0...

......

. . ....

0 0 0 . . . 1 + θ2

.

In general, the dispersion matrix of a qth-order moving-average process has qsubdiagonal and q supradiagonal bands of nonzero elements and zero elementselsewhere.

It is also helpful to define an autocovariance generating function which is apower series whose coefficients are the autocovariances γτ for successive valuesof τ . This is denoted by

(5.14) γ(z) =∑τ

γτzτ ; with τ = {0,±1,±2, . . .} and γτ = γ−τ .

The generating function is also called the z-transform of the autocovariancefunction.

The autocovariance generating function of the qth-order moving-averageprocess can be found quite readily. Consider the convolution

(5.15)

µ(z)µ(z−1) =∑i

µizi∑j

µjz−j

=∑i

∑j

µiµjzi−j

=∑τ

(∑j

µiµj+τ

)zτ , τ = i− j.

By referring to the expression for the autocovariance of lag τ of a moving-average process given under (10), it can be seen that the autocovariance gen-erating function is just

(5.16) γ(z) = σ2εµ(z)µ(z−1).

Autoregressive Processes

The pth-order autoregressive process, or AR(p) process, is defined by theequation

(5.17) α0y(t) + α1y(t− 1) + · · ·+ αpy(t− p) = ε(t).

69


This equation is invariably normalised by setting α0 = 1, although it wouldbe possible to set σ2

ε = 1 instead. The equation can be written in summarynotation as α(L)y(t) = ε(t), where α(L) = α0 + α1L + · · · + αpL

p. For theprocess to be stationary, the roots of the equation α(z) = α0 + α1z + · · · +αpz

p = 0 must lie outside the unit circle. This condition enables us to writethe autoregressive process as an infinite-order moving-average process in theform of y(t) = α−1(L)ε(t).

As an example, let us consider the first-order autoregressive process whichis defined by

(5.18)ε(t) = y(t)− φy(t− 1)

= (1− φL)y(t).

Provided that the process is stationary with |φ| < 1, it can be represented inmoving-average form as

(5.19)y(t) = (1− φL)−1ε(t)

={ε(t) + φε(t− 1) + φ2ε(t− 2) + · · ·

}.

The autocovariances of the process can be found by using the formula of (10)which is applicable to moving-average process of finite or infinite order. Thus

(5.20)


= E{∑

i

φiεt−i∑j

φjεt−τ−j}

=∑i

∑j

φiφjE(εt−iεt−τ−j);

and the result under (9) indicates that

(5.21)

γτ = σ2ε

∑j

φjφj+τ

=σ2εφ

τ

1− φ2.

For a vector y = [y1, y2, . . . , yT ]′ of T consecutive elements from a first-orderautoregressive process, the dispersion matrix has the form

(5.22) D(y) =σ2ε

1− φ2

1 φ φ2 . . . φT−1

φ 1 φ . . . φT−2

φ2 φ 1 . . . φT−3

......

.... . .

...φT−1 φT−2 φT−3 . . . 1

.

70


To find the autocovariance generating function for the general pth-orderautoregressive process, we may consider again the function α(z) =

∑i αiz

i.Since an autoregressive process may be treated as an infinite-order moving-average process, it follows that

(5.23) γ(z) =σ2ε

α(z)α(z−1).

For an alternative way of finding the autocovariances of the pth-order process,consider multiplying

∑i αiyt−i = εt by yt−τ and taking expectations to give

(5.24)∑i

αiE(yt−iyt−τ ) = E(εtyt−τ ).

Taking account of the normalisation α0 = 1, we find that

(5.25) E(εtyt−τ ) =

{σ2ε , if τ = 0;

0, if τ > 0.

Therefore, on setting E(yt−iyt−τ ) = γτ−i, equation (24) gives

(5.26)∑i

αiγτ−i =

{σ2ε , if τ = 0;

0, if τ > 0.

The second of these is a homogeneous difference equation which enables us togenerate the sequence {γp, γp+1, . . .} once p starting values γ0, γ1, . . . , γp−1 areknown. By letting τ = 0, 1, . . . , p in (26), we generate a set of p+ 1 equationswhich can be arrayed in matrix form as follows:

(5.27)

γ0 γ1 γ2 . . . γpγ1 γ0 γ1 . . . γp−1

γ2 γ1 γ0 . . . γp−2

......

.... . .

...γp γp−1 γp−2 . . . γ0

1α1

α2...αp

=

σ2ε

00...0

.These are called the Yule–Walker equations, and they can be used either forgenerating the values γ0, γ1, . . . , γp from the values α1, . . . , αp, σ

2ε or vice versa.

For an example of the two uses of the Yule–Walker equations, let us con-sider the second-order autoregressive process. In that case, we have

(5.28)

γ0 γ1 γ2

γ1 γ0 γ1

γ2 γ1 γ0

α0

α1

α2

=

α2 α1 α0 0 00 α2 α1 α0 00 0 α2 α1 α0

γ2

γ1

γ0

γ1

γ2

=

α0 α1 α2

α1 α0 + α2 0α2 α1 α0

γ0

γ1

γ2

=

σ2ε

00

.71


Given α0 = 1 and the values for γ0, γ1, γ2, we can find σ2ε and α1, α2. Con-

versely, given α0, α1, α2 and σ2ε , we can find γ0, γ1, γ2. It is worth recalling at

this juncture that the normalisation σ2ε = 1 might have been chosen instead

of α0 = 1. This would have rendered the equations more easily intelligible.Notice also how the matrix following the first equality is folded across the axiswhich divides it vertically to give the matrix which follows the second equality.Pleasing effects of this sort often arise in time-series analysis.

The Partial Autocorrelation Function

Let αr(r) be the coefficient associated with y(t − r) in an autoregres-sive process of order r whose parameters correspond to the autocovariancesγ0, γ1, . . . , γr. Then the sequence {αr(r); r = 1, 2, . . .} of such coefficients, whoseindex corresponds to models of increasing orders, constitutes the partial auto-correlation function. In effect, αr(r) indicates the role in explaining the varianceof y(t) which is due to y(t − r) when y(t − 1), . . . , y(t − r + 1) are also takeninto account.

Much of the theoretical importance of the partial autocorrelation functionis due to the fact that, when γ0 is added, it represents an alternative way ofconveying the information which is present in the sequence of autocorrelations.Its role in identifying the order of an autoregressive process is evident; for, ifαr(r) 6= 0 and if αp(p) = 0 for all p > r, then it is clearly implied that theprocess has an order of r.

The sequence of partial autocorrelations may be computed efficiently viathe recursive Durbin–Levinson Algorithm which uses the coefficients of the ARmodel of order r as the basis for calculating the coefficients of the model oforder r + 1.

To derive the algorithm, let us imagine that we already have the valuesα0(r) = 1, α1(r), . . . , αr(r). Then, by extending the set of rth-order Yule–Walkerequations to which these values correspond, we can derive the system

(5.29)

γ0 γ1 . . . γr γr+1

γ1 γ0 . . . γr−1 γr...

.... . .

......

γr γr−1 . . . γ0 γ1

γr+1 γr . . . γ1 γ0

1α1(r)

...αr(r)

0

=

σ2

(r)

0...0g

,

wherein

(5.30) g =r∑j=0

αj(r)γr+1−j with α0(r) = 1.

72


The system can also be written as

(5.31)

γ0 γ1 . . . γr γr+1

γ1 γ0 . . . γr−1 γr...

.... . .

......

γr γr−1 . . . γ0 γ1

γr+1 γr . . . γ1 γ0

0αr(r)

...α1(r)

1

=

g0...0σ2

(r)

.

The two systems of equations (29) and (31) can be combined to give

(5.32)

γ0 γ1 . . . γr γr+1

γ1 γ0 . . . γr−1 γr...

.... . .

......

γr γr−1 . . . γ0 γ1

γr+1 γr . . . γ1 γ0

1α1(r) + cαr(r)

...αr(r) + cα1(r)

c

=

σ2

(r) + cg0...0

g + cσ2(r)

.

If we take the coefficient of the combination to be

(5.33) c = − g

σ2(r)

,

then the final element in the vector on the RHS becomes zero and the systembecomes the set of Yule–Walker equations of order r + 1. The solution of theequations, from the last element αr+1(r+1) = c through to the variance termσ2

(r+1) is given by

(5.34)

αr+1(r+1) =1

σ2(r)

{ r∑j=0

αj(r)γr+1−j

}α1(r+1)

...αr(r+1)

=

α1(r)

...αr(r)

+ αr+1(r+1)

αr(r)...α1(r)

σ2

(r+1) = σ2(r)

{1− (αr+1(r+1))

2}.

Thus the solution of the Yule–Walker system of order r + 1 is easily derivedfrom the solution of the system of order r, and there is scope for devising arecursive procedure. The starting values for the recursion are

(5.35) α1(1) = −γ1/γ0 and σ2(1) = γ0

{1− (α1(1))

2}.

73


Autoregressive Moving Average Processes

The autoregressive moving-average process of orders p and q, which isreferred to as the ARMA(p, q) process, is defined by the equation

(5.36)α0y(t) + α1y(t− 1) + · · ·+ αpy(t− p)

= µ0ε(t) + µ1ε(t− 1) + · · ·+ µqε(t− q).

The equation is normalised by setting α0 = 1 and by setting either µ0 = 1or σ2

ε = 1. A more summary expression for the equation is α(L)y(t) = µ(L)ε(t).Provided that the roots of the equation α(z) = 0 lie outside the unit circle,the process can be represented by the equation y(t) = α−1(L)µ(L)ε(t) whichcorresponds to an infinite-order moving-average process. Conversely, providedthe roots of the equation µ(z) = 0 lie outside the unit circle, the process canbe represented by the equation µ−1(L)α(L)y(t) = ε(t) which corresponds to aninfinite-order autoregressive process.

By considering the moving-average form of the process, and by noting theform of the autocovariance generating function for such a process which is givenby equation (16), it can be seen that the autocovariance generating functionfor the autoregressive moving-average process is

(5.37) γ(z) = σ2ε

µ(z)µ(z−1)

α(z)α(z−1).

This generating function, which is of some theoretical interest, does notprovide a practical means of finding the autocovariances. To find these, let usconsider multiplying the equation

∑i αiyt−i =

∑i µiεt−i by yt−τ and taking

expectations. This gives

(5.38)∑i

αiγτ−i =∑i

µiδi−τ ,

where γτ−i = E(yt−τyt−i) and δi−τ = E(yt−τεt−i). Since εt−i is uncorrelatedwith yt−τ whenever it is subsequent to the latter, it follows that δi−τ = 0 ifτ > i. Since the index i in the RHS of the equation (38) runs from 0 to q, itfollows that

(5.39)∑i

αiγi−τ = 0 if τ > q.

Given the q+1 nonzero values δ0, δ1, . . . , δq, and p initial values γ0, γ1, . . . , γp−1

for the autocovariances, the equations can be solved recursively to obtain thesubsequent values {γp, γp+1, . . .}.

74


To find the requisite values δ0, δ1, . . . , δq, consider multiplying the equation∑i αiyt−i =

∑i µiεt−i by εt−τ and taking expectations. This gives

(5.40)∑i

αiδτ−i = µτσ2ε ,

where δτ−i = E(yt−iεt−τ ). The equation may be rewritten as

(5.41) δτ =1

α0

(µτσ

2ε −

∑i=1

δτ−i),

and, by setting τ = 0, 1, . . . , q, we can generate recursively the required valuesδ0, δ1, . . . , δq.

Example. Consider the ARMA(2, 2) model which gives the equation

(5.42) α0yt + α1yt−1 + α2yt−2 = µ0εt + µ1εt−1 + µ2εt−2.

Multiplying by yt, yt−1 and yt−2 and taking expectations gives

(5.43)

γ0 γ1 γ2

γ1 γ0 γ1

γ2 γ1 γ0

α0

α1

α2

=

δ0 δ1 δ20 δ0 δ10 0 δ0

µ0

µ1

µ2

.Multiplying by εt, εt−1 and εt−2 and taking expectations gives

(5.44)

δ0 0 0δ1 δ0 0δ2 δ1 δ0

α0

α1

α2

=

σ2ε 0 0

0 σ2ε 0

0 0 σ2ε

µ0

µ1

µ2

.When the latter equations are written as

(5.45)

α0 0 0α1 α0 0α2 α1 α0

δ0δ1δ2

= σ2ε

µ0

µ1

µ2

,they can be solved recursively for δ0, δ1 and δ2 on the assumption that thatthe values of α0, α1, α2 and σ2

ε are known. Notice that, when we adopt thenormalisation α0 = µ0 = 1, we get δ0 = σ2

ε . When the equations (43) arerewritten as

(5.46)

α0 α1 α2

α1 α0 + α2 0α2 α1 α0

γ0

γ1

γ2

=

µ0 µ1 µ2

µ1 µ2 0µ2 0 0

δ0

δ1δ2

,they can be solved for γ0, γ1 and γ2. Thus the starting values are obtainedwhich enable the equation

(5.47) α0γτ + α1γτ−1 + α2γτ−2 = 0; τ > 2

to be solved recursively to generate the succeeding values {γ3, γ4, . . .} of theautocovariances.

75


by

D.S.G. Pollock


The methods to be presented in this lecture are designed for the purpose ofanalysing series of statistical observations taken at regular intervals in time.The methods have a wide range of applications. We can cite astronomy [18],meteorology [9], seismology [21], oceanography [11], communications engineer-ing and signal processing [16], the control of continuous process plants [20],neurology and electroencephalography [1], [25], and economics [10]; and thislist is by no means complete.

1. The Frequency Domain and the Time Domain

The methods apply, in the main, to what are described as stationary ornon-evolutionary time series. Such series manifest statistical properties whichare invariant throughout time, so that the behaviour during one epoch is thesame as it would be during any other.

When we speak of a weakly stationary or covariance-stationary process,we have in mind a sequence of random variables y(t) = {yt; t = 0,±1,±2, . . .},representing the potential observations of the process, which have a commonfinite expected value E(xt) = µ and a set of autocovariances C(yt, ys) = E{(yt−µ)(ys−µ)} = γ|t−s| which depend only on the temporal separation τ = |t−s| ofthe dates t and s and not on their absolute values. We also commonly requireof such a process that lim(τ → ∞)γτ = 0 which is to say that the correlationbetween increasingly remote elements of the sequence tends to zero. This isa way of expressing the notion that the events of the past have a diminishingeffect upon the present as they recede in time. In an appendix to the paper,we review the definitions of mathematical expectations and covariances.

There are two distinct yet broadly equivalent modes of time-series anal-ysis which may be pursued. On the one hand are the time-domain methodswhich have their origin in the classical theory of correlation. Such methodsdeal preponderantly with the autocovariance functions and the cross-covariancefunctions of the series, and they lead inevitably towards the construction ofstructural or parametric models of the autoregressive moving-average type for

1


single series and of the transfer-function type for two or more causally relatedseries. Many of the methods which are used to estimate the parameters ofthese models can be viewed as sophisticated variants of the method of linearregression.

On the other hand are the frequency-domain methods of spectral analysis.These are based on an extension of the methods of Fourier analysis whichoriginate in the idea that, over a finite interval, any analytic function can beapproximated, to whatever degree of accuracy is desired, by taking a weightedsum of sine and cosine functions of harmonically increasing frequencies.

2. Harmonic Analysis

The astronomers are usually given credit for being the first to apply themethods of Fourier analysis to time series. Their endeavours could be describedas the search for hidden periodicities within astronomical data. Typical exam-ples were the attempts to uncover periodicities within the activities recordedby the Wolfer sunspot index and in the indices of luminosity of variable stars.

The relevant methods were developed over a long period of time. Lagrange[13] suggested methods for detecting hidden periodicities in 1772 and 1778.The Dutchman Buys-Ballot [6] propounded effective computational proceduresfor the statistical analysis of astronomical data in 1847. However, we shouldprobably credit Sir Arthur Schuster [17], who in 1889 propounded the techniqueof periodogram analysis, with being the progenitor of the modern methods foranalysing time series in the frequency domain.

In essence, these frequency-domain methods envisaged a model underlyingthe observations which takes the form of

(1)

y(t) =∑j

ρj cos(ωjt− θj) + ε(t)

=∑j


}+ ε(t),

where αj = ρj cos θj and βj = ρj sin θj , and where ε(t) is a sequence of indepen-dently and identically distributed random variables which we call a white-noiseprocess. Thus the model depicts the series y(t) as a weighted sum of perfectlyregular periodic components upon which is superimposed a random component.

The factor ρj =√

(α2j + β2

j ) is called the amplitude of the jth periodiccomponent, and it indicates the importance of that component within the sum.Since the variance of a cosine function, which is also called its mean-squaredeviation, is just one half, and since cosine functions at different frequenciesare uncorrelated, it follows that the variance of y(t) is expressible as V {y(t)} =12

∑j ρ

2j + σ2

ε where σ2ε = V {ε(t)} is the variance of the noise.

The periodogram is simply a device for determining how much of the vari-ance of y(t) is attributable to any given harmonic component. Its value at

2

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS

ωj = 2πj/T , calculated from a sample y0, . . . , yT−1 comprising T observationson y(t), is given by

(2)I(ωj) =

2

T

[{∑t

yt cos(ωj)

}2

+

{∑t

yt sin(ωj)

}2]

=T

2

{a2(ωj) + b2(ωj)

}.

If y(t) does indeed comprise only a finite number of well-defined harmoniccomponents, then it can be shown that 2I(ωj)/T is a consistent estimator ofρ2j in the sense that it converges to the latter in probability as the size T of the

sample of the observations on y(t) increases.

0 10 20 30 40 50 60 70 80 90

Figure 1. The graph of a sine function.

0 10 20 30 40 50 60 70 80 90

Figure 2. Graph of a sine function with small random fluctuations superimposed.

3


The process by which the ordinates of the periodogram converge upon thesquared values of the harmonic amplitudes was well expressed by Yule [24] ina seminal article of 1927:

If we take a curve representing a simple harmonic function of time,and superpose on the ordinates small random errors, the only effect isto make the graph somewhat irregular, leaving the suggestion of peri-odicity still clear to the eye. If the errors are increased in magnitude,the graph becomes more irregular, the suggestion of periodicity moreobscure, and we have only sufficiently to increase the “errors” to maskcompletely any appearance of periodicty. But, however large the er-rors, periodogram analysis is applicable to such a curve, and, givena sufficient number of periods, should yield a close approximation tothe period and amplitude of the underlying harmonic function.

We should not quote this passage without mentioning that Yule proceededto question whether the hypothesis underlying periodogram analysis, whichpostulates the equation under (1), was an appropriate hypothesis for all cases.

0

50

100

150

1750 1760 1770 1780 1790 1800 1810 1820 1830

0

50

100

150

1840 1850 1860 1870 1880 1890 1900 1910 1920

Figure 3. Wolfer’s Sunspot Numbers 1749–1924.

4


A highly successful application of periodogram analysis was that of Whit-taker and Robinson [22] who, in 1924, showed that the series recording thebrightness or magnitude of the star T. Ursa Major over 600 days could be fit-ted almost exactly by the sum of two harmonic functions with periods of 24 and29 days. This led to the suggestion that what was being observed was actu-ally a two-star system wherein the larger star periodically masked the smallerbrighter star. Somewhat less successful were the attempts of Arthur Schusterhimself [18] in 1906 to substantiate the claim that there is an eleven-year cyclein the activity recorded by the Wolfer sunspot index.

Other applications of the method of periodogram analysis were even lesssuccessful; and one application which was a significant failure was its use byWilliam Beveridge [2, 3] in 1921 and 1922 to analyse a long series of Europeanwheat prices. The periodogram of this data had so many peaks that at leasttwenty possible hidden periodicities could be picked out, and this seemed to bemany more than could be accounted for by plausible explanations within therealm of economic history. Such experiences seemed to point to the inappro-priateness to economic circumstances of a model containing perfectly regularcycles. A classic expression of disbelief was made by Slutsky [19] in anotherarticle of 1927:

Suppose we are inclined to believe in the reality of the strict periodicityof the business cycle, such, for example, as the eight-year period pos-tulated by Moore [14]. Then we should encounter another difficulty.Wherein lies the source of this regularity? What is the mechanism ofcausality which, decade after decade, reproduces the same sinusoidalwave which rises and falls on the surface of the social ocean with theregularity of day and night?

3. Autoregressive and Moving-Average Models

The next major episode in the history of the development of time-seriesanalysis took place in the time domain, and it began with the two articles of1927 by Yule [24] and Slutsky [19] from which we have already quoted. In botharticles, we find a rejection of the model with deterministic harmonic compo-nents in favour of models more firmly rooted in the notion of random causes. Ina wonderfully figurative exposition, Yule invited his readers to imagine a pen-dulum attached to a recording device and left to swing. Then any deviationsfrom perfectly harmonic motion which might be recorded must be the resultof errors of observation which could be all but eliminated if a long sequenceof observations were subjected to a periodogram analysis. Next, Yule enjoinedthe reader to imagine that the regular swing of the pendulum is interrupted bysmall boys who get into the room and start pelting the pendulum with peassometimes from one side and sometimes from the other. The motion is nowaffected not by superposed fluctuations but by true disturbances.

5


In this example, Yule contrives a perfect analogy for the autoregressivetime-series model. To explain the analogy, let us begin by considering a homo-geneous second-order difference equation of the form

(3) y(t) = φ1y(t− 1) + φ2y(t− 2).

Given the initial values y−1 and y−2, this equation can be used recursively togenerate an ensuing sequence {y0, y1, . . .}. This sequence will show a regularpattern of behaviour whose nature depends on the parameters φ1 and φ2. Ifthese parameters are such that the roots of the quadratic equation z2 − φ1z −φ2 = 0 are complex and less than unity in modulus, then the sequence of valueswill show a damped sinusoidal behaviour just as a clock pendulum will whichis left to swing without the assistance of the falling weights. In fact, in such acase, the general solution to the difference equation will take the form of

(4) y(t) = αρt cos(ωt− θ),

where the modulus ρ, which has a value between 0 and 1, is now the dampingfactor which is responsible for the attenuation of the swing as the time t elapses.

The autoregressive model which Yule was proposing takes the form of

(5) y(t) = φ1y(t− 1) + φ2y(t− 2) + ε(t),

where ε(t) is, once more, a white-noise sequence. Now, instead of maskingthe regular periodicity of the pendulum, the white noise has actually becomethe engine which drives the pendulum by striking it randomly in one directionand another. Its haphazard influence has replaced the steady force of thefalling weights. Nevertheless, the pendulum will still manifest a deceptivelyregular motion which is liable, if the sequence of observations is short andcontains insufficient contrary evidence, to be misinterpreted as the effect of anunderlying mechanism.

In his article of 1927, Yule attempted to explain the Wolfer index in termsof the second-order autoregressive model of equation (5). From the empiricalautocovariances of the sample represented in Figure 3, he estimated the val-ues φ1 = 1.343 and φ2 = −0.655. The general solution of the correspondinghomogeneous difference equation has a damping factor of ρ = 0.809 and anangular velocity of ω = 33.96o The angular velocity indicates a period of 10.6years which is a little shorter than the 11-year period obtained by Schusterin his periodogram analysis of the same data. In Figure 4, we show a serieswhich has been generated artificially from the Yule’s equation together with aseries generated by the equation y(t) = 1.576y(t − 1) − 0.903y(t − 2) + ε(t).The homogeneous difference equation which corresponds to the latter has thesame value of ω as before. Its damping factor has the value ρ = 0.95, and thisincrease accounts for the greater regularity of the second series.

6


0 10 20 30 40 50 60 70 80 90

Figure 4. A series generated by Yule’s equation

y(t) = 1.343y(t− 1)− 0.655y(t− 2) + ε(t).

0 10 20 30 40 50 60 70 80 90

Figure 5. A series generated by the equation

y(t) = 1.576y(t− 1)− 0.903y(t− 2) + ε(t).

Neither of our two series accurately mimics the sunspot index; althoughthe second series seems closer to it than the series generated by Yule’s equation.An obvious feature of the sunspot index which is not shared by the artificialseries is the fact that the numbers are constrained to be nonnegative. To relievethis constraint, we might apply to Wolf’s numbers yt a transformation of theform log(yt + λ) or of the more general form (yt + λ)κ−1, such as has beenadvocated by Box and Cox [4]. A transformed series could be more closelymimicked.

The contributions to time-series analysis made by Yule [24] and Slutsky[19] in 1927 were complementary: in fact, the two authors grasped oppositeends of the same pole. For ten years, Slutsky’s paper was available only in its

7


original Russian version; but its contents became widely known within a muchshorter period.

Slutsky posed the same question as did Yule, and in much the same man-ner. Was it possible, he asked, that a definite structure of a connection betweenchaotically random elements could form them into a system of more or less regu-lar waves? Slutsky proceeded to demonstrate this possibility by methods whichwere partly analytic and partly inductive. He discriminated between coherentseries whose elements were serially correlated and incoherent or purely randomseries of the sort which we have described as white noise. As to the coherentseries, he declared that

their origin may be extremely varied, but it seems probable that anespecially prominent role is played in nature by the process of movingsummation with weights of one kind or another; by this process coher-ent series are obtained from other coherent series or from incoherentseries.

By taking, as his basis, a purely random series obtained by the People’sCommissariat of Finance in drawing the numbers of a government lottery loan,and by repeatedly taking moving summations, Slutsky was able to generate aseries which closely mimicked an index, of a distinctly undulatory nature, ofthe English business cycle from 1855 to 1877.

The general form of Slutsky’s moving summation can be expressed bywriting

(6) y(t) = µ0ε(t) + µ1ε(t− 1) + · · ·+ µqε(t− q),

where ε(t) is a white-noise process. This is nowadays called a qth-order moving-average process, and it is readily compared to an autoregressive process of thesort depicted under (5). The more general pth-order autoregressive process canbe expressed by writing

(7) α0y(t) + α1y(t− 1) + · · ·+ αpy(t− p) = ε(t).

Thus, whereas the autoregressive process depends upon a linear combinationof the function y(t) with its own lagged values, the moving-average processdepends upon a similar combination of the function ε(t) with its lagged values.The affinity of the two sorts of process is further confirmed when it is recognisedthat an autoregressive process of finite order is equivalent to a moving-averageprocess of infinite order and that, conversely, a finite-order moving-averageprocess is just an infinite-order autoregressive process.

8


4. Generalised Harmonic Analysis

The next step to be taken in the development of the theory of time serieswas to generalise the traditional method of periodogram analysis in such a wayas to overcome the problems which arise when the model depicted under (1) isclearly inappropriate.

At first sight, it would not seem possible to describe a covariance-station-ary process, whose only regularities are statistical ones, as a linear combinationof perfectly regular periodic components. However any difficulties which wemight envisage can be overcome if we are prepared to accept a descriptionwhich is in terms of a non-denumerable infinity of periodic components. Thus,on replacing the so-called Fourier sum within equation (1) by a Fourier integral,and by deleting the term ε(t), whose effect is now absorbed by the integrand,we obtain an expression in the form of

(8) y(t) =

∫ π

0

{cos(ωt)dA(ω) + sin(ωt)dB(ω)

}.

Here we write dA(ω) and dB(ω) rather than α(ω)dω and β(ω)dω because therecan be no presumption that the functions A(ω) and B(ω) are continuous. As itstands, this expression is devoid of any statistical interpretation. Moreover, ifwe are talking of only a single realisation of the process y(t), then the generalisedfunctions A(ω) and B(ω) will reflect the unique peculiarities of that realisationand will not be amenable to any systematic description.

However, a fruitful interpretation can be given to these functions if we con-sider the observable sequence y(t) = {yt; t = 0,±1,±2, . . .} to be a particularrealisation which has been drawn from an infinite population representing allpossible realisations of the process. For, if this population is subject to statis-tical regularities, then it is reasonable to regard dA(ω) and dB(ω) as mutuallyuncorrelated random variables with well-defined distributions which dependupon the parameters of the population.

We may therefore assume that, for any value of ω,

(9)E{dA(ω)} = E{dB(ω)} = 0 and

E{dA(ω)dB(ω)} = 0.

Moreover, to express the discontinuous nature of the generalised functions, weassume that, for any two values ω and λ in their domain, we have

(10) E{dA(ω)dA(λ)} = E{dB(ω)dB(λ)} = 0,

which means that A(ω) and B(ω) are stochastic processes—indexed on thefrequency parameter ω rather than on time—which are uncorrelated in non-overlapping intervals. Finally, we assume that dA(ω) and dB(ω) have a com-mon variance so that

(11) V {dA(ω)} = V {dB(ω)} = dG(ω).

9


1

2

3

4

5

0 π/4 π/2 3π/4 π

Figure 6. The spectrum of the process y(t) = 1.343y(t−1)−0.655y(t−2) + ε(t) which generated the series in Figure 4. A series of a more regular

nature would be generated if the spectrum were more narrowly concentrated

around its modal value.

Given the assumption of the mutual uncorrelatedness of dA(ω) and dB(ω),it therefore follows from (8) that the variance of y(t) is expressible as

(12)

V {y(t)} =

∫ π

0

[cos2(ωt)V {dA(ω)}+ sin2(ωt)V {dB(ω)}

]=

∫ π

0

dG(ω).

The function G(ω), which is called the spectral distribution, tells us how muchof the variance is attributable to the periodic components whose frequenciesrange continuously from 0 to ω. If none of these components contributes morethan an infinitesimal amount to the total variance, then the function G(ω) isabsolutely continuous, and we can write dG(ω) = g(ω)dω under the integralof equation (11). The new function g(ω), which is called the spectral den-sity function or the spectrum, is directly analogous to the function expressingthe squared amplitude which is associated with each component in the simpleharmonic model discussed in our earlier sections.

10


5. Smoothing the Periodogram

It might be imagined that there is little hope of obtaining worthwhile es-timates of the parameters of the population from which the single availablerealisation y(t) has been drawn. However, provided that y(t) is a stationaryprocess, and provided that the statistical dependencies between widely sep-arated elements are weak, the single realisation contains all the informationwhich is necessary for the estimation of the spectral density function. In fact,a modified version of the traditional periodogram analysis is sufficient for thepurpose of estimating the spectral density.

In some respects, the problems posed by the estimation of the spectraldensity are similar to those posed by the estimation of a continuous probabilitydensity function of unknown functional form. It is fruitless to attempt directlyto estimate the ordinates of such a function. Instead, we might set about ourtask by constructing a histogram or bar chart to show the relative frequencieswith which the observations that have been drawn from the distribution fallwithin broad intervals. Then, by passing a curve through the mid points of thetops of the bars, we could construct an envelope that might approximate tothe sought-after density function. A more sophisticated estimation procedurewould not group the observations into the fixed intervals of a histogram; insteadit would record the number of observations falling within a moving interval.Moreover, a consistent method of estimation, which aims at converging uponthe true function as the number of observations increases, would vary the widthof the moving interval with the size of the sample, diminishing it sufficientlyslowly as the sample size increases for the number of sample points fallingwithin any interval to increase without bound.

A common method for estimating the spectral density is very similar tothe one which we have described for estimating a probability density function.Instead of basing itself on raw sample observations as does the method ofdensity-function estimation, it bases itself upon the ordinates of a periodogramwhich has been fitted to the observations on y(t). This procedure for spectralestimation is therefore called smoothing the periodogram.

A disadvantage of the procedure, which for many years inhibited its wide-spread use, lies in the fact that calculating the periodogram by what wouldseem to be the obvious methods by can be vastly time-consuming. Indeed, itwas not until the mid 1960’s that wholly practical computational methods weredeveloped.

6. The Equivalence of the Two Domains

It is remarkable that such a simple technique as smoothing the peri-odogram should provide a theoretical resolution to the problems encounteredby Beveridge and others in their attempts to detect the hidden periodicities ineconomic and astronomical data. Even more remarkable is the way in which

11


the generalised harmonic analysis that gave rise to the concept of the spec-tral density of a time series should prove itself to be wholly conformable withthe alternative methods of time-series analysis in the time domain which aroselargely as a consequence of the failure of the traditional methods of periodogramanalysis.

The synthesis of the two branches of time-series analysis was achieved in-dependently and almost simultaneously in the early 1930’s by Norbert Wiener[23] in America and A. Khintchine [12] in Russia. The Wiener–Khintchinetheorem indicates that there is a one-to-one relationship between the autoco-variance function of a stationary process and its spectral density function. Therelationship is expressed, in one direction, by writing,

(13) g(ω) =1

2π

∞∑τ=−∞

γτ cos(ωτ) ; γτ = γ−τ ,

where g(ω) is the spectral density function and {γτ ; τ = 0, 1, 2, . . .} is thesequence of the autocovariances of the series y(t).

The relationship is invertible in the sense that it is equally possible toexpress each of the autocovariances as a function of the spectral density:

(14) γτ =

∫ π

ω=0

cos(ωτ)g(ω)dω.

If we set τ = 0, then cos(ωτ) = 1, and we obtain, once more, the equation (12)which neatly expresses the way in which the variance γ0 = V {y(t)} of the seriesy(t) is attributable to the constituent harmonic components; for g(ω) is simplythe expected value of the squared amplitude of the component at frequency ω.

We have stated the relationships of the Wiener–Khintchine theorem interms of the theoretical spectral density function g(ω) and the true autocovari-ance function {γτ ; τ = 0, 1, 2, . . .}. An analogous relationship holds betweenthe periodogram I(ωj) defined in (2) and the sample autocovariance function{cτ ; τ = 0, 1, . . . , T − 1} where cτ =

∑(yt − y)(yt−τ − y)/T . Thus, in the

appendix, we demonstrate the identity

(15) I(ωj) = 2T−1∑t=1−T

cτ cos(ωjτ) ; cτ = c−τ .

The upshot of the Wiener–Khintchine theorem is that many of the tech-niques of time-series analysis can, in theory, be expressed in two mathematicallyequivalent ways which may differ markedly in their conceptual qualities.

Often, a problem which appears to be intractable from the point of viewof one of the domains of time-series analysis becomes quite manageable when

12


translated into the other domain. A good example is provided by the matter ofspectral estimation. Given that there are difficulties in computing all T of theordinates of the periodogram when the sample size is large, we are impelled tolook for a method of spectral estimation which depends not upon smoothingthe periodogram but upon performing some equivalent operation upon the se-quence of autocovariances. The fact that there is a one-to-one correspondencebetween the spectrum and the sequence of autocovariances assures us that thisequivalent operation must exist; though there is, of course, no guarantee thatit will be easy to perform.

10

20

30

40

0 π/4 π/2 3π/4 π

Figure 7. The periodogram of Wolfer’s Sunspot Numbers 1749–1924.

In fact, the operation which we perform upon the sample autocovariances issimple. For, if the sequence of autocovariances {cτ ; τ = 0, 1, . . . , T − 1} in (15)is replaced by a modified sequence {wτ cτ ; τ = 0, 1, . . . , T − 1} incorporatinga specially devised set of declining weights {wτ ; τ = 0, 1, . . . , T − 1}, thenan effect which is much the same as that of smoothing the periodogram canbe achieved. Moreover, it may be relatively straightforward to calculate theweighted autocovariance function.

The task of devising appropriate sets of weights provided a major researchtopic in time-series analysis in the 1950’s and early 1960’s. Together with thetask of devising equivalent procedures for smoothing the periodogram, it cameto be known as spectral carpentry.

13


0.2

0.4

0.6

0.8

0 π/4 π/2 3π/4 π

Figure 8. The spectrum of the sunspot numbers calculated from

the autocovariances using Parzen’s [15] system of weights.

7. The Maturing of Time-Series Analysis

In retrospect, it seems that time-series analysis reached its maturity in the1970’s when significant developments occurred in both of its domains.

A major development in the frequency domain occurred when Cooley andTukey [7] described an algorithm which greatly reduces the effort involved incomputing the periodogram. The Fast Fourier Transform, as this algorithm hascome to be known, allied with advances in computer technology, has enabled theroutine analysis of extensive sets of data; and it has transformed the procedureof smoothing the periodogram into a practical method of spectral estimation.

The contemporaneous developments in the time domain were influenced byan important book by Box and Jenkins [5]. These authors developed the time-domain methodology by collating some of its major themes and by applying itto such important functions as forecasting and control. They demonstrated howwide had become the scope of time-series analysis by applying it to problemsas diverse as the forecasting of airline passenger numbers and the analysis ofcombustion processes in a gas furnace. They also adapted the methodology tothe computer.

Many of the current practitioners of time-series analysis have learnt theirskills in recent years during a time when the subject has been expanding rapidly.Lacking a longer perspective, it is difficult for them to gauge the significanceof the recent practical advances. One might be surprised to hear, for example,

14


that as late as 1971 Granger and Hughes [8] were capable of declaring thatBeveridge’s calculation of the Periodogram of the Wheat Price Index, com-prising 300 ordinates, was the most extensive calculation of its type to date.Nowadays, computations of this order are performed on a routine basis usingmicrocomputers containing specially designed chips which are dedicated to thepurpose.

The rapidity of the recent developments also belies the fact that time-seriesanalysis has had a long history. The frequency domain of time-series analy-sis, to which the idea of the harmonic decomposition of a function is central,is an inheritance from Euler (1707–1783), d’Alembert (1717–1783), Lagrange(1736–1813) and Fourier (1768–1830). The search for hidden periodicities wasa dominant theme of 19th century science. It has been transmogrified throughthe refinements of Wiener’s Generalised Harmonic Analysis which has enabledus to understand how cyclical phenomena can arise out of the aggregation ofrandom causes. The parts of time-series analysis which bear a truly 20th-century stamp are the time-domain models which originate with Slutsky andYule and the computational technology which renders the methods of bothdomains practical.

The effect of the revolution in digital electronic computing upon the practi-cability of time-series analysis can be gauged by inspecting the purely mechan-ical devices (such as the Henrici–Conradi and Michelson–Stratton harmonicanalysers invented in the 1890’s) which were once used, with very limited suc-cess, to grapple with problems which are nowadays almost routine. Thesedevices, some of which are displayed in London’s Science Museum, also serveto remind us that many of the developments of applied mathematics whichstartle us with their modernity were foreshadowed many years ago.

Mathematical Appendix

Mathematical Expectations

The mathematical expectation or the expected value of a random variabley is defined by

(i) E(x) =

∫ ∞x=−∞

xdF (x),

where F (x) is the probability distribution function of x. The probability distri-bution function is defined by the expression F (x∗) = P{x < x∗} which denotesthe probability that x assumes a value less than x∗. If F (x) is continuousfunction, then we can write dF (x) = f(x)dx in equation (i). The functionf(x) = dF (x)/dx is called the probability density function.

If y(t) = {yt; t = 0,±1,±2, . . .} is a stationary stochastic process, thenE(yt) = µ is the same value for all t.

15


If y0, . . . , yT−1 is a sample of T values generated by the process, then wemay estimate µ from the sample mean

(ii) y =1

T

T−1∑t=0

yt.

Autocovariances

The autocovariance of lag τ of the a stationary stochastic process y(t) isdefined by

(iii) γτ = E{(yt − µ)(yt−τ − µ)}.

The autocovariance of lag τ provides a measure of the relatedness of the ele-ments of the sequence y(t) which are separated by τ time periods.

The variance, which is denoted by V {y(t)} = γ0 and defined by

(iv) γ0 = E{

(yt − µ)2},

is a measure of the dispersion of the elements of y(t). It is formally the auto-covariance of lag zero.

If yt and yt−τ are statistically independent, then their joint probabilitydensity function is the product of their individual probability density functionsso that f(yt, yt−τ ) = f(yt)f(yt−τ ). It follows that

(v) γτ = E(yt − µ)E(yt−τ − µ) = 0 for all τ 6= 0.

If y0, . . . , yT is a sample from the process, and if τ < T , then we may estimateγτ from the sample autocovariance or empirical autocovariance of lag τ :

(vi) cτ =1

T

T−1∑t=τ

(yt − y)(yt−τ − y).

The periodogram and the autocovariance function

The periodogram is defined by

(vii) I(ωj) =2

T

[{ T−1∑t=0

cos(ωjt)(yt − y)

}2

+

{ T−1∑t=0

sin(ωjt)(yt − y)

}2].

16


The identity∑t cos(ωjt)(yt − y) =

∑t cos(ωjt)yt follows from the fact that,

by construction,∑t cos(ωjt) = 0 for all j. Hence the above expression has the

same value as the expression in (2). Expanding the expression in (vii) gives

(viii)

I(ωj) =2

T

{∑t

∑s

cos(ωjt) cos(ωjs)(yt − y)(ys − y)

}+

2

T

{∑t

∑s

sin(ωjt) sin(ωjs)(yt − y)(ys − y)

},

and, by using the identity cos(A) cos(B) + sin(A) sin(B) = cos(A−B), we canrewrite this as

(ix) I(ωj) =2

T

{∑t

∑s

cos(ωj [t− s])(yt − y)(ys − y)

}.

Next, on defining τ = t − s and writing cτ =∑t(yt − y)(yt−τ − y)/T , we can

reduce the latter expression to

(x) I(ωj) = 2T−1∑

τ=1−Tcos(ωjτ)cτ ,

which appears in the text as equation (15).

References

[1] Alberts, W. W., L. E. Wright and B. Feinstein (1965), “PhysiologicalMechanisms of Tremor and Rigidity in Parkinsonism.” Confinia Neuro-logica, 26, 318–327.

[2] Beveridge, Sir W. H. (1921), “Weather and Harvest Cycles.” EconomicJournal, 31, 429–452.

[3] Beveridge, Sir W. H. (1922), “Wheat Prices and Rainfall in Western Eu-rope.” Journal of the Royal Statistical Society, 85, 412–478.

[4] Box, G. E. P. and D. R. Cox (1964), “An Analysis of Transformations.”Journal of the Royal Statistical Society, Series B, 26, 211–243.

[2] Box, G. E. P. and G. M. Jenkins (1970), Time Series Analysis, Forecastingand Control. Holden–Day: San Francisco.

[6] Buys–Ballot, C. D. H. (1847), “Les Changements Periodiques de Temper-ature.” Utrecht.

[7] Cooley, J. W. and J. W. Tukey (1965), “An Algorithm for the MachineCalculation of Complex Fourier Series.” Mathematics of Computation, 19,297–301.

17


[8] Granger, C. W. J. and A. O. Hughes (1971), “A New Look at Some OldData: The Beveridge Wheat Price Series.”Journal of the Royal StatisticalSociety, Series A, 134, 413–428.

[9] Groves, G. W. and E. J. Hannan, (1968), “Time-Series Regression of SeaLevel on Weather.” Review of Geophysics, 6, 129–174.

[10] Gudmundson, G. (1971), “Time-Series Analysis of Imports, Exports andother Economic Variables.”Journal of the Royal Statistical Society, SeriesA, 134, 383.

[11] Hassleman, K., W. Munk and G. MacDonald, (1963), “Bispectrum ofOcean Waves.” In Time Series Analysis, M. Rosenblatt, (ed.) 125–139.John Wiley and Sons: New York.

[12] Khintchine, A. (1934), “Korrelationstheorie der Stationaren Stochastis-chen Prozesse.” Math. Ann., 109, 604–615.

[13] Lagrange, E. (1772, 1778), “Oeuvres.”[14] Moore, H. L. (1914), “Economic Cycles: Their Laws and Cause.” Macmil-

lan: New York.[15] Parzen, E. (1957), “On Consistent Estimates of the Spectrum of a Sta-

tionary Time Series.” Annals of Mathematical Statistics, 28, 329–348.[16] Rice, S. O. (1963), “Noise in FM Receivers.” In Time Series Analysis, M.

Rosenblatt, (ed.) 395–422. John Wiley and Sons: New York.[17] Schuster, Sir A. (1898), “On the Investigation of Hidden Periodicities with

Application to a Supposed Twenty-Six Day Period of Meteorological Phe-nomena.” Terrestrial Magnetism, 3, 13–41.

[18] Schuster, Sir A. (1906), “On the Periodicities of Sunspots.” PhilosophicalTransactions of the Royal Society, Series A, 206, 69–100.

[19] Slutsky, E. (1937), “The Summation of Random Causes as the Source ofCyclical Processes.” Econometrica, 5, 105–146.

[20] Tee, L. H. and S. U. Wu (1972), “An Application of Stochastic and Dy-namic Models for the Control of a Papermaking Process.” Technometrics,14 481–496.

[21] Tukey, J. W. (1965), “ Data Analysis and the Frontiers of Geophysics.”Science, 148, 1283–1289.

[22] Whittaker, E. T. and G. Robinson (1924), “The Calculus of Observations,A Treatise on Numerical Mathematics.” Blackie and Sons: London.

[23] Wiener, N. (1930), “Generalised Harmonic Analysis.” Acta Mathematica,35, 117–258.

[24] Yule, G. U. (1927), “On a Method of Investigating Periodicities in Dis-turbed Series with Special Reference to Wolfer’s Sunspot Numbers.” Philo-sophical Transactions of the Royal Society, 89, 1–64.

[25] Yuzuriha, T. (1960), “The Autocorrelation Curves of Schizophrenic BrainWaves and the Power Spectrum.” Psych. Neurol. Jap. 26, 911–924.

18


by

D.S.G. Pollock


This paper describes some of the principal themes of time-series analysis

and it gives an historical account of their development.

There are two distinct yet broadly equivalent modes of time-series anal-

ysis which may be pursued. On the one hand there are the time-domain

methods which have their origin in the classical theory of correlation; and

they lead inevitably towards the construction of structural or parametric

models of the autoregressive moving-average type. On the other hand are

the frequency-domain methods of spectral analysis which are based on an

extension of the methods of Fourier analysis.

The paper describes the developments which led to the synthesis of

the two branches of time-series analysis and it indicates how this synthesis

was achieved.

It remains true that the majority of time-series analysts operate prin-

cipally in one or other of the two domains. Such specialisation is often

influenced by the academic discipline to which the analyst adheres. How-

ever, it is clear that there are many advantages to be derived from pursuing

the two modes of analysis concurrently.

Address for correspondence:

D.S.G. PollockDepartment of EconomicsQueen Mary CollegeUniversity of LondonMile End RoadLondon E1 4 NS

Tel : +44-71-975-5096Fax : +44-71-975-5500

19

LECTURE 7

Forecastingwith ARMA Models

Minimum Mean-Square Error Prediction

Imagine that y(t) is a stationary stochastic process with E{y(t)} = 0.We may be interested in predicting values of this process several periods intothe future on the basis of its observed history. This history is contained inthe so-called information set. In practice, the latter is always a finite set{yt, yt−1, . . . , yt−p} representing the recent past. Nevertheless, in developingthe theory of prediction, it is also useful to consider an infinite information setIt = {yt, yt−1, . . . , yt−p, . . .} representing the entire past.

We shall denote the prediction of yt+h which is made at the time t byyt+h|t or by yt+h when it is clear that we are predicting h steps ahead.

The criterion which is commonly used in judging the performance of anestimator or predictor y of a random variable y is its mean-square error definedby E{(y − y)2}. If all of the available information on y is summarised in itsmarginal distribution, then the minimum-mean-square-error prediction is sim-ply the expected value E(y). However, if y is statistically related to anotherrandom variable x whose value can be observed, and if the form of the jointdistribution of x and y is known, then the minimum-mean-square-error predic-tion of y is the conditional expectation E(y|x). This proposition may be statedformally:

(1) Let y = y(x) be the conditional expectation of y given x which isalso expressed as y = E(y|x). Then E{(y − y)2} ≤ E{(y − π)2},where π = π(x) is any other function of x.

Proof. Consider

(2)E{

(y − π)2}

= E[{

(y − y) + (y − π)}2]

= E{

(y − y)2}

+ 2E{

(y − y)(y − π)}

+ E{

(y − π)2}

1

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS

Within the second term, there is

(3)

E{

(y − y)(y − π)}

=

∫x

∫y

(y − y)(y − π)f(x, y)∂y∂x

=

∫x

{∫y

(y − y)f(y|x)∂y

}(y − π)f(x)∂x

= 0.

Here the second equality depends upon the factorisation f(x, y) = f(y|x)f(x)which expresses the joint probability density function of x and y as the productof the conditional density function of y given x and the marginal density func-tion of x. The final equality depends upon the fact that

∫(y − y)f(y|x)∂y =

E(y|x) − E(y|x) = 0. Therefore E{(y − π)2} = E{(y − y)2} + E{(y − π)2} ≥E{(y − y)2}, and the assertion is proved.

The definition of the conditional expectation implies that

(4)

E(xy) =

∫x

∫y

xyf(x, y)∂y∂x

=

∫x

x

{∫y

yf(y|x)∂y

}f(x)∂x

= E(xy).

When the equation E(xy) = E(xy) is rewritten as

(5) E{x(y − y)

}= 0,

it may be described as an orthogonality condition. This condition indicatesthat the prediction error y − y is uncorrelated with x. The result is intuitivelyappealing; for, if the error were correlated with x, we should not using theinformation of x efficiently in forming y.

The proposition of (1) is readily generalised to accommodate the casewhere, in place of the scalar x, there is a vector x = [x1, . . . , xp]

′. This gen-eralisation indicates that the minimum-mean-square-error prediction of yt+hgiven the information in {yt, yt−1, . . . , yt−p} is the conditional expectationE(yt+h|yt, yt−1, . . . , yt−p).

In order to determine the conditional expectation of yt+h given {yt, yt−1,. . . , yt−p}, we need to known the functional form of the joint probability den-sity function all of these variables. In lieu of precise knowledge, we are oftenprepared to assume that the distribution is normal. In that case, it follows thatthe conditional expectation of yt+h is a linear function of {yt, yt−1, . . . , yt−p};and so the problem of predicting yt+h becomes a matter of forming a linear

2

D.S.G. POLLOCK : FORECASTING

regression. Even if we are not prepared to assume that the joint distributionof the variables in normal, we may be prepared, nevertheless, to base the pre-diction of y upon a linear function of {yt, yt−1, . . . , yt−p}. In that case, thecriterion of minimum-mean-square-error linear prediction is satisfied by form-ing yt+h = φ1yy + φ2yt−1 + · · ·+ φp+1yt−p from the values φ1, . . . , φp+1 whichminimise

(6)

E{

(yt+h − yt+h)2}

= E

{(yt+h −

p+1∑j=1

φjyt−j+1

)2}

= γ0 − 2∑j

φjγh+j−1 +∑i

∑j

φiφjγi−j ,

wherein γi−j = E(εt−iεt−j). This is a linear least-squares regression problemwhich leads to a set of p+ 1 orthogonality conditions described as the normalequations:

(7)E{

(yt+h − yt+h)yt−j+1

}= γh+j−1 −

p∑i=1

φiγi−j

= 0 ; j = 1, . . . , p+ 1.

In matrix terms, these are

(8)

γ0 γ1 . . . γpγ1 γ0 . . . γp−1

......

. . ....

γp γp−1 . . . γ0

φ1

φ2...

φp+1

=

γhγh+1

...γh+p

.Notice that, for the one-step-ahead prediction of yt+1, they are nothing but theYule–Walker equations.

In the case of an optimal predictor which combines previous values of theseries, it follows from the orthogonality principle that the forecast errors areuncorrelated with the previous predictions.

A result of this sort is familiar to economists in connection with the so-called efficient-markets hypothesis. A financial market is efficient if the prices ofthe traded assets constitute optimal forecasts of their discounted future returns,which consist of interest and dividend payments and of capital gains.

According to the hypothesis, the changes in asset prices will be uncorre-lated with the past or present price levels; which is to say that asset prices willfollow random walks. Moreover, it should not be possible for someone who isappraised only of the past history of asset prices to reap speculative profits ona systematic and regular basis.

3


Forecasting with ARMA Models

So far, we have avoided making specific assumptions about the nature ofthe process y(t). We are greatly assisted in the business of developing practicalforecasting procedures if we can assume that y(t) is generated by an ARMAprocess such that

(9) y(t) =µ(L)

α(L)ε(t) = ψ(L)ε(t).

We shall continue to assume, for the sake of simplicity, that the forecastsare based on the information contained in the infinite set {yt, yt−1, yt−2, . . .} =It comprising all values that have been taken by the variable up to the presenttime t. Knowing the parameters in ψ(L) enables us to recover the sequence{εt, εt−1, εt−2, . . .} from the sequence {yt, yt−1, yt−2, . . .} and vice versa; so ei-ther of these constitute the information set. This equivalence implies that theforecasts may be expressed in terms {yt} or in terms {εt} or as a combinationof the elements of both sets.

Let us write the realisations of equation (9) as

(10)yt+h = {ψ0εt+h + ψ1εt+h−1 + · · ·+ ψh−1εt+1}

+ {ψhεt + ψh+1εt−1 + · · ·}.

Here the first term on the RHS embodies disturbances subsequent to the timet when the forecast is made, and the second term embodies disturbances whichare within the information set {εt, εt−1, εt−2, . . .}. Let us now define a forecast-ing function, based on the information set, which takes the form of

(11) yt+h|t = {ρhεt + ρh+1εt−1 + · · ·}.

Then, given that ε(t) is a white-noise process, it follows that the mean squareof the error in the forecast h periods ahead is given by

(12) E{

(yt+h − yt+h)2}

= σ2ε

h−1∑i=0

ψ2i + σ2

ε

∞∑i=h

(ψi − ρi)2.

Clearly, the mean-square error is minimised by setting ρi = ψi; and so theoptimal forecast is given by

(13) yt+h|t = {ψhεt + ψh+1εt−1 + · · ·}.

This might have been derived from the equation y(t + h) = ψ(L)ε(t + h),which generates the true value of yt+h, simply by putting zeros in place of theunobserved disturbances εt+1, εt+2, . . . , εt+h which lie in the future when the

4


forecast is made. Notice that, on the assumption that the process is stationary,the mean-square error of the forecast tends to the value of

(14) V{y(t)

}= σ2

ε

∑ψ2i

as the lead time h of the forecast increases. This is nothing but the variance ofthe process y(t).

The optimal forecast of (5) may also be derived by specifying that theforecast error should be uncorrelated with the disturbances up to the time ofmaking the forecast. For, if the forecast errors were correlated with some ofthe elements of the information set, then, as we have noted before, we wouldnot be using the information efficiently, and we could not be generating opti-mal forecasts. To demonstrate this result anew, let us consider the covariancebetween the forecast error and the disturbance εt−i:

(15)

E{

(yt+h − yt+h)εt−i}

=h∑k=1

ψh−kE(εt+kεt−i)

+∞∑j=0

(ψh+j − ρh+j)E(εt−jεt−i)

= σ2ε(ψh+i − ρh+i).

Here the final equality follows from the fact that

(16) E(εt−jεt−i) =

{σ2ε , if i = j,

0, if i 6= j.

If the covariance in (15) is to be equal to zero for all values of i ≥ 0, then wemust have ρi = ψi for all i, which means that the forecasting function must bethe one that has been specified already under (13).

It is helpful, sometimes, to have a functional notation for describing theprocess which generates the h-steps-ahead forecast. The notation provided byWhittle (1963) is widely used. To derive this, let us begin by writing

(17) y(t+ h) ={L−hψ(L)

}ε(t).

On the LHS, there are not only the lagged sequences {ε(t), ε(t−1), . . .} but alsothe sequences ε(t+ h) = L−hε(t), . . . , ε(t+ 1) = L−1ε(t), which are associatedwith negative powers of L which serve to shift a sequence forwards in time. Let{L−hψ(L)}+ be defined as the part of the operator containing only nonnegativepowers of L. Then the forecasting function can be expressed as

(18)

y(t+ h|t) ={L−hψ(L)

}+ε(t),

=

{ψ(L)

Lh

}+

1

ψ(L)y(t).

5


Example. Consider an ARMA (1, 1) process represented by the equation

(19) (1− φL)y(t) = (1− θL)ε(t).

The function which generates the sequence of forecasts h steps ahead is givenby

(20)

y(t+ h|t) =

{L−h

[1 +

(φ− θ)L1− φL

]}+

ε(t)

= φh−1 (φ− θ)1− φL ε(t)

= φh−1 (φ− θ)1− θL y(t).

When θ = 0, this gives the simple result that y(t+ h|t) = φhy(t).

Generating The Forecasts Recursively

We have already seen that the optimal (minimum-mean-square-error) fore-cast of yt+h can be regarded as the conditional expectation of yt+h given theinformation set It which comprises the values of {εt, εt−1, εt−2, . . .} or equallythe values of {yt, yt−1, yt−2, . . .}. On taking expectations of y(t) and ε(t) con-ditional on It, we find that

(21)

E(yt+k|It) = yt+k|t if k > 0,

E(yt−j |It) = yt−j if j ≥ 0,

E(εt+k|It) = 0 if k > 0,

E(εt−j |It) = εt−j if j ≥ 0.

In this notation, the forecast h periods ahead is

(22)

E(yt+h|It) =

h∑k=1

ψh−kE(εt+k|It) +

∞∑j=0

ψh+jE(εt−j |It)

=∞∑j=0

ψh+jεt−j .

In practice, the forecasts may be generated using a recursion based on theequation

(23)y(t) = −

{α1y(t− 1) + α2y(t− 2) + · · ·+ αpy(t− p)

}+ µ0ε(t) + µ1ε(t− 1) + · · ·+ µqε(t− q).

6


By taking the conditional expectation of this function, we get

(24)yt+h = −{α1yt+h−1 + · · ·+ αpyt+h−p}

+ µhεt + · · ·+ µqεt+h−q when 0 < h ≤ p, q,

(25) yt+h = −{α1yt+h−1 + · · ·+ αpyt+h−p} if q < h ≤ p,

(26)yt+h = −{α1yt+h−1 + · · ·+ αpyt+h−p}

+ µhεt + · · ·+ µqεt+h−q if p < h ≤ q,

and

(27) yt+h = −{α1yt+h−1 + · · ·+ αpyt+h−p} when p, q < h.

It can be from (27) that, for h > p, q, the forecasting function becomes apth-order homogeneous difference equation in y. The p values of y(t) fromt = r = max(p, q) to t = r− p+ 1 serve as the starting values for the equation.

The behaviour of the forecast function beyond the reach of the startingvalues can be characterised in terms of the roots of the autoregressive operator.It may be assumed that none of the roots of α(L) = 0 lie inside the unit circle;for, if there were roots inside the circle, then the process would be radicallyunstable. If all of the roots are less than unity, then yt+h will converge tozero as h increases. If one of the roots of α(L) = 0 is unity, then we have anARIMA(p, 1, q) model; and the general solution of the homogeneous equationof (27) will include a constant term which represents the product of the unitroot with an coefficient which is determined by the starting values. Hence theforecast will tend to a nonzero constant. If two of the roots are unity, thenthe general solution will embody a linear time trend which is the asymptote towhich the forecasts will tend. In general, if d of the roots are unity, then thegeneral solution will comprise a polynomial in t of order d− 1.

The forecasts can be updated easily once the coefficients in the expansionof ψ(L) = µ(L)/α(L) have been obtained. Consider

(28)yt+h|t+1 = {ψh−1εt+1 + ψhεt + ψh+1εt−1 + · · ·} and

yt+h|t = {ψhεt + ψh+1εt−1 + ψh+2εt−2 + · · ·}.

The first of these is the forecast for h − 1 periods ahead made at time t + 1whilst the second is the forecast for h periods ahead made at time t. It can beseen that

(29) yt+h|t+1 = yt+h|t + ψh−1εt+1,

7


where εt+1 = yt+1 − yt+1 is the current disturbance at time t+ 1. The later isalso the prediction error of the one-step-ahead forecast made at time t.

Example. For an example of the analytic form of the forecast function, wemay consider the Integrated Autoregressive (IAR) Process defined by

(30){

1− (1 + φ)L+ φL2}y(t) = ε(t),

wherein φ ∈ (0, 1). The roots of the auxiliary equation z2 − (1 + φ)z + φ = 0are z = 1 and z = φ. The solution of the homogeneous difference equation

(31){

1− (1 + φ)L+ φL2}y(t+ h|t) = 0,

which defines the forecast function, is

(32) y(t+ h|t) = c1 + c2φh,

where c1 and c2 are constants which reflect the initial conditions. These con-stants are found by solving the equations

(33)yt−1 = c1 + c2φ

−1,

yt = c1 + c2.

The solutions are

(34) c1 =yt − φyt−1

1− φ and c2 =φ

φ− 1(yt − yt−1).

The long-term forecast is y = c1 which is the asymptote to which the forecaststend as the lead period h increases.

Ad-hoc Methods of Forecasting

There are some time-honoured methods of forecasting which, when anal-ysed carefully, reveal themselves to be the methods which are appropriate tosome simple ARIMA models which might be suggested by a priori reason-ing. Two of the leading examples are provided by the method of exponentialsmoothing and the Holt–Winters trend-extrapolation method.

Exponential Smoothing. A common forecasting procedure is exponentialsmoothing. This depends upon taking a weighted average of past values of thetime series with the weights following a geometrically declining pattern. Thefunction generating the one-step-ahead forecasts can be written as

(35)y(t+ 1|t) =

(1− θ)1− θL y(t)

= (1− θ){y(t) + θy(t− 1) + θ2y(t− 2) + · · ·

}.

8


On multiplying both sides of this equation by 1− θL and rearranging, we get

(36) y(t+ 1|t) = θy(t|t− 1) + (1− θ)y(t),

which shows that the current forecast for one step ahead is a convex combina-tion of the previous forecast and the value which actually transpired.

The method of exponential smoothing corresponds to the optimal fore-casting procedure for the ARIMA(0, 1, 1) model (1 − L)y(t) = (1 − θL)ε(t),which is better described as an IMA(1, 1) model. To see this, let us considerthe ARMA(1, 1) model y(t)− φy(t− 1) = ε(t)− θε(t− 1). This gives

(37)

y(t+ 1|t) = φy(t)− θε(t)

= φy(t)− θ (1− φL)

1− θL y(t)

={(1− θL)φ− (1− φL)θ}

1− θL y(t)

=(φ− θ)1− θL y(t).

On setting φ = 1, which converts the ARMA(1, 1) model to an IMA(1, 1) model,we obtain precisely the forecasting function of (35).

The Holt–Winters Method. The Holt–Winters algorithm is useful in ex-trapolating local linear trends. The prediction h periods ahead of a seriesy(t) = {yt, t = 0,±1,±2, . . .} which is made at time t is given by

(38) yt+h|t = αt + βth,

where

(39)αt = λyt + (1− λ)(αt−1 + βt−1)

= λyt + (1− λ)yt|t−1

is the estimate of an intercept or levels parameter formed at time t and

(40) βt = µ(αt − αt−1) + (1− µ)βt−1

is the estimate of the slope parameter, likewise formed at time t. The coeffi-cients λ, µ ∈ (0, 1] are the smoothing parameters.

The algorithm may also be expressed in error-correction form. Let

(41) et = yt − yt|t−1 = yt − αt−1 − βt−1

9


be the error at time t arising from the prediction of yt on the basis of informationavailable at time t− 1. Then the formula for the levels parameter can be givenas

(42)αt = λet + yt|t−1

= λet + αt−1 + βt−1,

which, on rearranging, becomes

(43) αt − αt−1 = λet + βt−1.

When the latter is drafted into equation (40), we get an analogous expressionfor the slope parameter:

(44)βt = µ(λet + βt−1) + (1− µ)βt−1

= λµet + βt−1.

In order reveal the underlying nature of this method, it is helpful to com-bine the two equations (42) and (44) in a simple state-space model:

(45)

[α(t)

β(t)

]=

[1 10 1

] [α(t− 1)

β(t− 1)

]+

[λλµ

]e(t).

This can be rearranged to give

(46)

[1− L −L

0 1− L

] [α(t)

β(t)

]=

[λλµ

]e(t).

The solution of the latter is

(47)

[α(t)

β(t)

]=

1

(1− L)2

[1− L L

0 1− L

] [λλµ

]e(t).

Therefore, from (38), it follows that

(48)

y(t+ 1|t) = α(t) + β(t)

=(λ+ λµ)e(t) + λe(t− 1)

(1− L)2.

This can be recognised as the forecasting function of an IMA(2, 2) model ofthe form

(49) (I − L)2y(t) = µ0ε(t) + µ1ε(t− 1) + µ2ε(t− 2)

10


for which

(50) y(t+ 1|t) =µ1ε(t) + µ2ε(t− 1)

(1− L)2.

The Local Trend Model. There are various arguments which suggest thatan IMA(2, 2) model might be a natural model to adopt. The simplest of thesearguments arises from an elaboration of a second-order random walk whichadds an ordinary white-noise disturbance to the tend. The resulting modelmay be expressed in two equations

(51)(I − L)2ξ(t) = ν(t),

y(t) = ξ(t) + η(t),

where ν(t) and η(t) are mutually independent white-noise processes. Combiningthe equations, and using the notation ∇ = 1− L, gives

(52)

y(t) =ν(t)

∇2+ η(t)

=ν(t) +∇2η(t)

∇2.

Here the numerator ν(t)+∇2η(t) = {ν(t)+η(t)}−2η(t−1)+η(t−2) constitutesan second-order MA process.

Slightly more elaborate models with the same outcome have also beenproposed. Thus the so-called structural model consists of the equations

(53)

y(t) = µ(t) + ε(t),

µ(t) = µ(t− 1) + β(t− 1) + η(t),

β(t) = β(t− 1) + ζ(t).

Working backwards from the final equation gives

(54)

β(t) =ζ(t)

∇ ,

µ(t) =β(t− 1)

∇ +η(t)

∇=ζ(t− 1)

∇2+η(t)

∇ ,

y(t) =ζ(t− 1)

∇2+η(t)

∇ + ε(t)

=ζ(t− 1) +∇η(t) +∇2ε(t)

∇2.

11


Once more, the numerator constitutes a second-order MA process.

Equivalent Forecasting Functions

Consider a model which combines a global linear trend with an autoregres-sive disturbance process:

(55) y(t) = γ0 + γ1t+ε(t)

I − φL.

The formation of an h-step-ahead prediction is straightforward; for we canseparate the forecast function into two additive parts.

The first part of the function is the extrapolation of the global linear trend.This takes the form of

(56)zt+h|t = γ0 + γ1(t+ h)

= zt + γ1h

where zt = γ0 + γ1t.The second part is the prediction associated with the AR(1) disturbance

term η(t) = (I −φL)−1ε(t). The following iterative scheme is provides a recur-sive solution to the problem of generating the forecasts:

(57)

ηt+1|t = φηt,

ηt+2|t = φηt+1|t,

ηt+3|t = φηt+2|t, etc.

Notice that the analytic solution of the associated difference equation is just

(58) ηt+h|t = φhηt.

This reminds us that, whenever we can express the forecast function in termsof a linear recursion, we can also express it in an analytic form embodying theroots of a polynomial lag operator. The operator in this case is the AR(1)operator I−φL. Since, by assumption, |φ| < 1, it is clear that the contributionof the disturbance part to the overall forecast function

(59) yt+h|t = zt+h|t + ηt+h|t,

becomes negligible when h becomes large.Consider the limiting case when φ→ 1. Now, in place of an AR(1) distur-

bance process, we have to consider a random-walk process. We know that theforecast function of a random walk consists of nothing more than a constant

12


function. On adding this constant to the linear function zt+h|t = γ0 + γ1(t+h)we continue to have a simple linear forecast function.

Another way of looking at the problem depends upon writing equation(55) as

(60) (I − φL){y(t)− γ0 − γ1t

}= ε(t).

Setting φ = 1 turns the operator I−φL into the difference operator I−L = ∇.But ∇γ0 = 0 and ∇γ1t = γ1, so equation (60) with φ = 1 can also be writtenas

(61) ∇y(t) = γ1 + ε(t).

This is the equation of a process which is described as random walk with drift.Yet another way of expressing the process is via the equation y(t) = y(t− 1) +γ1 + ε(t).

It is intuitively clear that, if the random walk process ∇z(t) = ε(t) isassociated with a constant forecast function, and if z(t) = y(t)− γ0− γ1t, theny(t) will be associated with a linear forecast function.

The purpose of this example has been to offer a limiting case where mod-els with local stochastic trends—ie. random walk and unit root models—andmodels with global polynomial trends come together. Finally, we should noticethat the model of random walk with drift has the same linear forecast functionas the model

(62) ∇2y(t) = ε(t)

which has two unit roots in the AR operator.

13

LECTURE 8

The Identificationof ARIMA Models

As we have established in a previous lecture, there is a one-to-one cor-respondence between the parameters of an ARMA(p, q) model, including thevariance of the disturbance, and the leading p + q + 1 elements of the auto-covariance function. Given the true autocovariances of a process, we mightbe able to discern the orders p and q of its autoregressive and moving-averageoperators and, given these orders, we should then be able to deduce the valuesof the parameters.

There are two other functions, prominent in time-series analysis, fromwhich it is possible to recover the parameters of an ARMA process. Theseare the partial autocorrelation function and the spectral density function. Theappearance of each of these functions gives an indication of the nature of theunderlying process to which they belong; and, in theory, the business of iden-tifying the model and of recovering its parameters can be conducted on thebasis of any of them. In practice, the process is assisted by taking account ofall three functions.

The empirical versions of the three functions which are used in a model-building exercise may differ considerably from their theoretical counterparts.Even when the data are truly generated by an ARMA process, the samplingerrors which affect the empirical functions can lead one to identify the wrongmodel. This hazard is revealed by sampling experiments. When the data comefrom the real world, the notion that there is an underlying ARMA processis a fiction, and the business of model identification becomes more doubtful.Then there may be no such thing as the correct model; and the choice amongstalternative models must be made partly with a view their intended uses.

The Autocorrelation Functions

The techniques of model identification which are most commonly used werepropounded originally by Box and Jenkins (1972). Their basic tools were thesample autocorrelation function and the partial autocorrelation function. Weshall describe these functions and their use separately from the spectral densityfunction which ought, perhaps, to be used more often in selecting models.The fact that spectral density function is often overlooked is probably due to

1

D.S.G. POLLOCK : ECONOMIC FORECASTING 1992/3

an unfamiliarity with frequency-domain analysis on the part of many modelbuilders.

Autocorrelation function (ACF). Given a sample y0, y1, . . . , yT−1 of Tobservations, we define the sample autocorrelation function to be the sequenceof values

(1) rτ = cτ/c0, τ = 0, 1, . . . , T − 1,

wherein

(2) cτ =1

T

T−1∑t=τ

(yt − y)(yt−τ − y)

is the empirical autocovariance at lag τ and c0 is the sample variance. Oneshould note that, as the value of the lag increases, the number of observationscomprised in the empirical autocovariance diminishes until the final elementcT−1 = T−1(y0 − y)(yT−1 − y) is reached which comprises only the first andlast mean-adjusted observations.

In plotting the sequence {rτ}, we shall omit the value of r0 which is in-variably unity. Moreover, in interpreting the plot, one should be wary of givingtoo much credence to the empirical autocorrelations at lag values which aresignificantly high in relation to the size of the sample.

Partial autocorrelation function (PACF). The sample partial autocor-relation pτ at lag τ is simply the correlation between the two sets of residualsobtained from regressing the elements yt and yt−τ on the set of interveningvalues y1, y2, . . . , yt−τ+1. The partial autocorrelation measures the dependencebetween yt and yt−τ after the effect of the intervening values has been removed.

The sample partial autocorrelation pτ is virtually the same quantity asthe estimated coefficient of lag τ obtained by fitting an autoregressive model oforder τ to the data. Indeed, the difference between the two quantities vanishesas the sample size increases. The Durbin–Levinson algorithm provides an effi-cient way of computing the sequence {pτ} of partial autocorrelations from thesequence of {cτ} of autocovariances. It can be seen, in view of this algorithm,that the information in {cτ} is equivalent to the information contained jointlyin {pτ} and c0. Therefore the sample autocorrelation function {rt} and thesample partial autocorrelation function {pt} are equivalent in terms of theirinformation content.

The Methodology of Box and Jenkins

The model-building methodology of Box and Jenkins, relies heavily uponthe two functions {rt} and {pt} defined above. It involves a cycle comprisingthe three stages of model selection, model estimation and model checking. Inview of the difficulties of selecting an appropriate model, it is envisaged thatthe cycle might have to be repeated several times and that, at the end, theremight be more than one model of the same series.

2

D.S.G. POLLOCK : ARIMA IDENTIFICATION

15.5

16.0

16.5

17.0

17.5

18.0

18.5

0 50 100 150

0.00

0.25

0.50

0.75

1.00

0 5 10 15 20 25

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

0 5 10 15 20 25

Figure 1. The concentration readings from a chemical process with the

autocorrelation function and the autocorrelation function of the differences.

3


Reduction to stationarity. The first step, which is taken before embarkingon the cycle, is to examine the time plot of the data and to judge whether ornot it could be the outcome of a stationary process. If a trend is evident inthe data, then it must be removed. A variety of techniques of trend removal,which include the fitting of parametric curves and of spline functions, havebeen discussed in previous lectures. When such a function is fitted, it is to thesequence of residuals that the ARMA model is applied.

However, Box and Jenkins were inclined to believe that many empiricalseries can be modelled adequately by supposing that some suitable differenceof the process is stationary. Thus the process generating the observed seriesy(t) might be modelled by the ARIMA(p, d, q) equation

(3) α(L)∇dy(t) = µ(L)ε(t),

wherein ∇d = (I − L)d is the dth power of the difference operator. In thatcase, the differenced series z(t) = ∇dy(t) will be described by a stationaryARMA(p, q) model. The inverse operator ∇−1 is the summing or integratingoperator, which accounts for the fact that the model depicted by equation (3)is described an autoregressive integrated moving-average model.

To determine whether stationarity has been achieved, either by trend re-moval or by differencing, one may examine the autocorrelation sequence of theresidual or processed series. The sequence corresponding to a stationary processshould converge quite rapidly to zero as the value of the lag increases. An em-pirical autocorrelation function which exhibits a smooth pattern of significantvalues at high lags indicates a nonstationary series.

An example is provided by Figure 1 where a comparison is made betweenthe autocorrelation function of the original series and that of its differences.Although the original series does not appear to embody a systematic trend,it does drift in a haphazard manner which suggests a random walk; and it isappropriate to apply the difference operator.

Once the degree of differencing has been determined, the autoregressiveand moving-average orders are selected by examining the sample autocorrela-tions and sample partial autocorrelations. The characteristics of pure autore-gressive and pure moving-average process are easily spotted. Those of a mixedautoregressive moving-average model are not so easily unravelled.

Moving-average processes. The theoretical autocorrelation function {ρτ}of a pure moving-average process of order q has ρτ = 0 for all τ > q. Thecorresponding partial autocorrelation function {πτ} is liable to decay towardszero gradually. To judge whether the corresponding sample autocorrelationfunction {rτ} shows evidence of a truncation, we need some scale by which tojudge the significance of the values of its elements.

4


0 1 2 3 4

0 −1 −2 −3−4 −5

0 25 50 75 100

0.00

0.25

0.50

0.75

1.00

−0.25

0 5 10 15 20 25

0.000.250.500.751.00

−0.25−0.50−0.75

0 5 10 15 20 25Figure 2. The graph of 120 observations on a simulated series generated

by the MA(2) process y(t) = (1 + 0.90L+ 0.81L2)ε(t) together with the

theoretical and empirical ACF’s (middle) and the theoretical and empirical

PACF’s (bottom). The theoretical values correspond to the solid bars.

5


As a guide to determining whether the parent autocorrelations are in factzero after lag q, we may use a result of Bartlett [1946] which shows that, for asample of size T , the standard deviation of rτ is approximately

(4)1√T

{1 + 2(r2

1 + r22 + · · ·+ r2

q)}1/2

for τ > q.

The result is also given by Fuller [1976, p. 237]. A simpler measure of the scaleof the autocorrelations is provided by the limits of ±1.96/

√T which are the

approximate 95% confidence bounds for the autocorrelations of a white-noisesequence. These bounds are represented by the dashed horizontal lines on theaccompanying graphs.

Autoregressive processes. The theoretical autocorrelation function {ρτ}of a pure autoregressive process of order p obeys a homogeneous differenceequation based upon the autoregressive operator α(L) = 1 +α1L+ · · ·+αpL

p.That is to say

(5) ρτ = −(α1ρτ−1 + · · ·+ αpρτ−p) for all τ ≥ p.

In general, the sequence generated by this equation will represent a mixture ofdamped exponential and sinusoidal functions. If the sequence is of a sinusoidalnature, then the presence of complex roots in the operator α(L) is indicated.One can expect the empirical autocovariance function of a pure AR process tobe of the same nature as its theoretical parent.

It is the partial autocorrelation function which serves most clearly to iden-tify a pure AR process. The theoretical partial autocorrelations function {πτ}of a AR(p) process has πτ = 0 for all τ > p. Likewise, all elements of thesample partial autocorrelation function are expected to be close to zero for lagsgreater than p, which corresponds to the fact that they are simply estimatesof zero-valued parameters. The significance of the values of the partial auto-correlations is judged by the fact that, for a pth order process, their standarddeviations for all lags greater that p are approximated by 1/

√T . Thus the

bounds of ±1.96/√T are also plotted on the graph of the partial autocorrela-

tion function.

Mixed processes. In the case of a mixed ARMA(p, q) process, neither thetheoretical autocorrelation function not the theoretical partial autocorrelationfunction have any abrupt cutoffs. Indeed, there is little that can be inferredfrom either of these functions or from their empirical counterparts beyond thefact that neither a pure MA model nor a pure AR model would be inappropriate.On its own, the autocovariance function of an ARMA(p, q) process is not easilydistinguished from that of a pure AR process. In particular, its elements γτsatisfy the same difference equation as that of a pure AR model for all valuesof τ > max(p, q).

6


0

5

10

15

0

−5

−10

−15

0 25 50 75 100

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

0 5 10 15 20 25

0.000.250.500.751.00

−0.25−0.50−0.75−1.00

0 5 10 15 20 25Figure 3. The graph of 120 observations on a simulated series generated

by the AR(2) process (1− 1.69L+ 0.81L2)y(t) = ε(t) together with the

theoretical and empirical ACF’s (middle) and the theoretical and empirical


7


0 10 20 30 40

0 −10−20 −30

0 20 50 75 100

0.00

0.25

0.50

0.75

1.00

−0.25

−0.50

0 5 10 15 20 25

0.000.250.500.751.00

−0.25−0.50−0.75−1.00

0 5 10 15 20 25Figure 4. The graph of 120 observations on a simulated series generated by the

ARMA(2, 2) process (1−1.69L+0.81L2)y(t) = (1+0.90L+0.81L2)ε(t) together

with the theoretical and emprical ACF’s (middle) and the theoretical and empirical


8


There is good reason to regard mixed models as more appropriate in prac-tice than pure models of either variety. For a start, there is the fact that arational transfer function is far more effective in approximating an arbitraryimpulse response than is an autoregressive transfer function, whose parametersare confined to the denominator, or a moving-average transfer function, whichhas its parameters in the numerator. Indeed, it might be appropriate, some-times, to approximate a pure process of a high order by a more parsimoniousmixed model.

Mixed models are also favoured by the fact that the sum of any two mu-tually independent autoregressive process gives rise to an ARMA process. Lety(t) and z(t) be autoregressive processes of orders p and r respectively whichare described by the equations α(L)y(t) = ε(t) and ρ(L)z(t) = η(t), whereinε(t) and η(t) are mutually independent white-noise processes. Then their sumwill be

(6)

y(t) + z(t) =ε(t)

α(L)+η(t)

ρ(L)

=ρ(L)ε(t) + α(L)η(t)

α(L)ρ(L)=

µ(L)ζ(t)

α(L)ρ(L),

where µ(L)ζ(t) = ρ(L)ε(t) + α(L)η(t) constitutes a moving-average process oforder max(p, r).

In economics, where the data series are highly aggregated, mixed modelswould seem to be called for often. In the context of electrical and mechanicalengineering, there may be some justification for pure AR models. Here there isoften abundant data, sufficient to sustain the estimation of pure autoregressivemodels of high order. Therefore the principle of parametric parsimony is lesspersuasive than it might be in an econometric context. However, pure ARmodels perform poorly whenever the data is affected by errors of observation;and, in this respect, a mixed model is liable to be more robust. One canunderstand this feature of mixed models by recognising that the sum of a pureAR(p) process an a white-noise process is an ARMA(p, p) process.

9

LECTURE 9

Nonparametric Estimation ofthe Spectral Density Function

The Spectrum and the Periodogram

The spectral density of a stochastic process is defined by

(1) f(ω) =1

2π

{γ0 + 2

∞∑τ=1

γτ cos(ωτ)

}, ω ∈ [0, π].

The obvious way to estimate this function is to replace the unknown autoco-variances {γτ} by the corresponding empirical moments {cτ} where

(2) cτ =1

T

T−1∑t=τ

(yt−τ − y)(yt − y) if τ ≤ T − 1.

Notice that, beyond a lag of τ = T − 1, the autocovariances are not estimablesince

(3) cT−1 =1

T(y0 − y)(yT−1 − y)

comprises the first and the last elements of the sample; and therefore, we mustset cτ = 0 when τ > T − 1. Thus we obtain a sample spectrum in the form of

(4) fr(ω) =1

2π

{c0 + 2

T−1∑τ=1

cτ cos(ωτ)

}.

The sample spectrum defined in this way is just 1/4π times the periodogramof the sample which is given by

(5)

I(ωj) = 2

{c0 + 2

T−1∑τ=1

cτ cos(ωjτ)

}

=

{[∑t

yt cos(ωjt)

]2

+

[∑t

yt sin(ωjt)

]2}=T

2

{α2j + β2

j

},

1


where

(6) αj =1

T

∑t

yt cosωjt and βj =1

T

∑t

yt sinωjt.

As we have defined it above, the periodogram has just n ordinates which cor-respond to the values

(7)ωj = 0, 2

π

T, . . . , π

(T − 1)

Twhen T is odd, or

ωj = 0, 2π

T, . . . , π when T is even.

Although this method of estimating the spectrum via the periodogrammay result, in some cases, in unbiased estimates of the corresponding ordinatesof the spectral density function, it does not result in consistent estimates. Thisis hardly suprising when we recall that, in the case where T is even, the Fourierdecomposition of the sample y0, . . . , yT−1, upon which the method is directlybased, requires us to determine the T coefficients

α0, (α1, β1), . . . , (αn−1, βn−1), βn,

where n = T/2, from a total of T observations. For a set of parameters to beestimated consistently, we require that the amount of the relevant informationwhich is available should increase with the size of the sample; and this cannothapppen in the present case.

These conclusions can be illustrated quite simply in the case where y(t) =ε(t) is a white-noise sequence with a uniform spectrum f(ω) = σ2/2π over therange {−π ≤ ω ≤ π}. The values of αj and βj which characterize the samplespectrum and the periodogram are precisely the ones which would result fromfitting the regression model

(8) y(t) = αj cos(ωjt) + βj sin(ωjt) + ε(t)

to the to the data y0, . . . , yT−1. From the ordinary theory of linear regression,it follows that, if the population values which are estimated by αj and βj arein fact zero, which they must be on the assumption that y(t) = ε(t), then

(9)

1

σ2

{α2j

∑t

cos2(ωjt) + β2j

∑t

sin2(ωjt)

}=

T

2σ2

(α2j + β2

j

)=Ijσ2

has a chi-square distribution of two degrees of freedom. The variance of achi-square distribution of k degrees of freedom is just 2k. Thus we find that

2

D.S.G. POLLOCK : SPECTRAL ESTIMATION

V (Ij/σ2) = 4; whence it follows that the variance of the spectral estimate

fr(ωj) = Ij/4π is

(10) V {fr(ωj)} =σ4

4π2= f2(ωj).

Clearly, this value does not diminish as T increases.A further consequence of using the periodogram directly to estimate the

spectrum is that the estimators of f(ωj) and f(ωk) will be uncorrelated for allj 6= k. This follows from the orthogonality of the sine and cosine functionswhich serve as a basis for the Fourier decomposition of the sample. The factthat adjacent values of the estimated spectrum are uncorrelated means that itwill have a particularly volatile appearance.

Spectrum Averaging

One way of improving the properties of the estimate of f(ωj) is to comprisewithin the estimator several adjacent values from the periodogram. Thus wemay define a new estimator in the form of

(11) fs(ωj) =k=m∑k=−m

µkfr(ωj−k).

In addition to the value of the periodogram at the point ωj , this comprisesa further m adjacent values falling on either side. The set of weights

{µ−m, µ1−m, . . . , µm−1, µm}

should sum to unity as well being symmetric in the sense that µ−k = µk. Theydefine what is known as a spectral window. Some obvious problems arise indefining values of the estimate towards the boundaries of the set of frequencies{ωj ; 0 ≤ ωj ≤ π}. These problems can be overcome by treating the spectrumas symmetric about the points 0 and π so that, for example, we define

(12) fs(π) = µ0fr(π) + 2

m∑k=1

µkfr(π − ωk).

The estimate fs(ωj) comprises a total of M = 2m + 1 ordinates of theperiodogram which span an interval of Q = 4mπ/T radians. This number ofradians Q is the so-called bandwidth of the estimator. If Q is kept constant,then M increases at the same rate as T . This means that, in spite of theincreasing sample size, we are denied the advantage of increasing the acuityor resolution of our estimation; so that narrow peaks in the spectrum, which

3


have been smoothed over, may escape detection. Conversely, if we maintainthe value of M , then the size of the bandwith will decrease with T , and wemay retain some of the disadvantages of the original periodogram. Ideally, weshould allow M to increase at a slower rate than T so that, as M →∞, we willhave Q→ 0.

Weighting in the Time Domain

An alternative approach to spectral estimation is to give differentialweighting to the estimated autocovariances comprised in our formula for thesample spectrum, so that diminishing weights are given to the values of cτ as τincreases. This seems reasonable since the precision of these estimates decreasesas τ increases. If the series of weights associated with the the autocovariancesc0, c1, . . . , cT−1 are denoted by m0,m1, . . . ,mT−1, then our revised estimatorfor the spectrum takes the form of

(13) fw(ω) =1

2π

{m0c0 + 2

T−1∑τ=1

mτ cτ cos(ωτ)

}.

The series of weights define what is described as a lag window. If the weightsare zero-valued beyond mR , then we describe R as the truncation point.

A wide variety of lag windows have been defined. Amongst those whichare used nowadays are the Tukey–Hanning window defined by

(14) mτ =1

2

{1 + cos

(πτR

)}; τ = 0, 1, . . . , R

and the Parzen window defined by

(15)

mτ = 1− 6( τR

)2

+ 6( τR

)3

; 0 ≤ τ ≤ 12R,

mτ = 2(

1− τ

R

)3

; 12R ≤ τ ≤ R.

The Relationship between Smoothing and Weighting

It would be suprising if we were unable to interpret the method of smooth-ing the periodogram in terms of an equivalent method of weighting the auto-covariance function and vice versa.

Consider the smoothed periodogram defined by

(16) fs(ωj) =

m∑k=−m

µkfr(ωj−k).

4

D.S.G. POLLOCK : SPECTRAL ESTIMATION

Given that the ordinates of the original periodogram I(ωj) corrrespond to thepoints ωj defined in (7), it follows that fr(ωj−k) = fr(ωj − ωk), where ωk =2πk/T . Therefore, on substituting

(17)

fr(ωj−k) =1

2π

T−1∑τ=1−T

cτ exp(−iωj−kτ)

=1

2π

∑τ

cτ exp(−i[ωj − ωk]τ)

into (16), we get

(18)

fs(ωj) =∑k

µk

{1

2π

∑τ

cτ exp(−i[ωj − ωk]τ)

}=

1

2π

∑τ

{∑k

µk exp(iωkτ)

}cτ exp(−iωjτ)

=1

2π

∑τ

mτ cτ exp(−iωjτ)

where

(19) mτ =m∑

k=−mµke

iωkτ ; ωk =2πk

T

is the finite Fourier transform of the sequence of weights

{µ−m, µ1−m, . . . , µm−1, µm}

which define the spectral window.The final expression under (18) would be the same as our expression for

the spectral estimator given under (13) were it not for the fact that we havedefined the present function over the set of values {ωj ; j = 1, . . . , n} insteadof over the interval ω = [0, π], and for the fact that we have used a complexexponential expression instead of a cosine.

It is also possible to demonstrate an inverse relationship whereby a spec-tral estimator which depends upon weighting the autocovariance function isequivalent to another estimator which smooths the periodogram. Consider aspectral estimator in the form of

(20) fw(ω0) =1

2π

T−1∑τ=1−T

mτ cτ exp(−iω0τ).

5


where

(21) mτ =

∫ω

u(ω)eiωτ dω

has an inverse Fourier transform given by

(22) u(ω) =1

2π

∞∑τ=−∞

mτe−iωτ

On substituting the expression for mτ from (21) into (20), we get

(23)

fw(ω0) =1

2π

∑τ

{∫ω

u(ω)eiωτ dω

}cτe−iω0τ

=

∫ω

u(ω)

{ ∑τ

cτei(ω−ω0)τ

}dω

=

∫ω

u(ω)fr(ω0 − ω)dω.

This shows that the technique of weighting the autocovariance function cor-responds, in general, to a technique of smoothing the periodogram. However,to sustain this interpretation, we must define the periodogram not just at nfrequency points {ωj ; j = 1, . . . , n}, as we have done in (5), but over the entireinterval [−π, π]. Notice that, on setting ω = 0 in (21), we get

(24) m0 =

∫ω

u(ω)dω

It is desirable that the weighting function should integrate to unity over therelevant range, and this requires us to set m0 = 1. The latter is exactly thevalue by which we would expect to weight the estimated variance c0 within theformula in (13) which defines the spectral estimator fw(ω).

6

LECTURE 10

Seasonal Models andSeasonal Adjustment

So far we have relied upon the method of trigonometrical regression forbuilding models which can be used for forecasting seasonal economic time series.It has proved necessary, invariably, to perform the preliminary task of elimi-nating a trend from the data before determining the seasonal pattern from theresiduals. In most of the cases which we have analysed, the trend has beenmodelled quite successfully by a simple analytic function such as a quadratic.However, it is not always possible to find an analytic function which serves thepurpose. In some cases a stochastic trend seems to be more appropriate. Sucha trend is generated by an autoregressive operator with units roots. Once astochastic unit-root model has been adopted for the trend, it seems naturalto model the pattern of seasonal fluctuations in the same manner by usingautoregressive operators with complex-valued roots of unit modulus.

The General Multiplicative Seasonal Model

Let

(1) z(t) = ∇dy(t)

be a de-trended series which exhibits seasonal behaviour with a periodicity of speriods. Imagine, for the sake of argument, that the period between successiveobservations is one month, which means that the seasons have a cycle of s = 12months. Once the trend has been extracted from the original series y(t) bydifferencing, we would expect to find a strong relationship between the valuesof observations taken in the same month of successive years. In the simplestcircumstances, we might find that the difference between yt and yt−12 is a smallrandom quantity. If the sequence of the twelve-period differences were whitenoise, then we should have a relationship of the form

(2) z(t) = z(t− 12) + ε(t) or, equivalently, ∇12y(t) = ε(t).

This is ostensibly an autoregressive model with an operator in the form of∇12 = 1 − L12. However, it is interesting to note in passing that, if y(t) were

1

D.S.G. POLLOCK: TIME SERIES AND FORECASTING

generated by a regression model in the form of

(3) y(t) =6∑j=0

ρj cos(ωj − θj) + η(t),

where ωj = πj/6 = j × 30◦, then we should have

(4) (1− L12)y(t) = η(t)− η(t− 12) = ζ(t);

and, if the disturbance sequence η(t) were white noise, then the residual termζ(t) = η(t)− η(t− 12) would show the following pattern of correlation:

(5) C(ζt, ζt−j) =

{σ2, if j mod 12 = 0;

0, otherwise.

It can be imagined that a more complicated relationship stretches over theyears which connects the months of the calender. By a simple analogy with theordinary ARMA model, we can devise a model of the form

(6) Φ(L12)∇D12z(t) = Θ(L12)η(t),

where Φ(z) is a polynomial of degree P and Θ(z) is a polynomial of degreeQ. In effect, this model is applied to twelve separate time series—one for eachmonth of the year—whose observations are seperated by yearly intervals. Ifη(t) were a white-noise sequence of independently and identically distributedrandom variables, then there would be no connection between the twelve timeseries.

If there is a connection between successive months within the year, thenthere should be a pattern of serial correlation amongst the elements of thedisturbance process η(t). One might propose to model this pattern using asecond ARMA of the form

(7) α(L)η(t) = µ(L)ε(t),

where α(z) is a polynomial of degree p and µ(z) is a polynomial of degree q.The various components of our analysis can now be assembled. By com-

bining equations (1) (6) and (7), we can derive the following general model forthe sequence y(t):

(8) Φ(L12)α(L)∇D12∇dy(t) = Θ(L12)µ(L)ε(t).

A model of this sort has been described by Box and Jenkins as the generalmultiplicative seasonal model. To denote such a model in a summary fashion,

2

D.S.G. POLLOCK : SEASONALITY

they describe it as an ARIMA (P,D,Q) × (p, d, q) model. Although, in thegeneral version of the model, the seasonal difference operator ∇12 is raised tothe power D; it is unusual to find values other that D = 0, 1.

Factorisation of The Seasonal Difference Operator

The equation under (8) should be regarded as a portmanteau in whicha collection of simplified models can be placed. The profusion of symbols inequation (8) tends to suggest a model which is too complicated to be of practicaluse. Moreover, even with ∇12 in place of ∇D12, there is a redundancy in thenotation to we should draw attention. This redundancy arises from the fact thatthe seasonal difference operator ∇D12 already contains the operator ∇ = I−L asone of its factors. Therefore, unless this factor is eliminated, there is a dangerthat the original sequence y(t) will be subjected, inadvertently, to one moredifferencing operation than is intended.

The twelve factors of the operator ∇D12 = I − L12 contain the so-calledtwelfth-order roots of unity which are the solutions of the algebraic equation1 = z12. The factorisation may be demonstrated in three stages. To begin, itis easy to see that

(9)I − L12 = (I − L)(I + L+ L2 + · · ·+ L11)

= (I − L)(I + L2 + L4 + · · ·+ L10)(I + L).

The next step is to recognise that

(10)(I + L2 + L4 + · · ·+ L10)

= (1−√3L+ L2)(I − L+ L2)(I + L2)(I + L+ L2)(1 +√

3L+ L2).

Finally, it can be see that the generic quadratic factor has the form of

(11) 1− 2 cos(ωj)L+ L2 = (1− eiωjL)(1− e−iωjL).

where ωj = πj/6 = j × 30◦.Figure 1 shows the disposition of the twelfth roots of unity around the unit

circle in the complex plane.A cursory inspection of equation (9) indicates that the first-order difference

operator ∇ = I −L is indeed one of the factors of ∇12 = I −L12. Therefore, ifthe sequence y(t) has been reduced to stationarity already by the applicationof d first-order differencing operations, then its subsequent differencing via theoperator ∇12 is unnecessary and is liable to destroy some of the characteristicsof the sequence which ought to be captured by the ARIMA model.

The factorisation of the seasonal difference operator also helps to explainhow the seasonal ARMA model can give rise to seemingly regular cycles of theappropriate duration.

3


− i

i

−1 1Re

Im

Figure 1. The 12th roots of unity inscribed in the unit circle.

Consider a simple second-order autoregressive model with complex-valuedroots of unit modulus:

(12){I − 2 cos(ωj)L+ L2

}yj(t) = εj(t).

Such a model can gives rise to quite regular cycles whose average duration is2π/ωj periods. The graph of the sequence generated by a model with ωj =ω1 = π/6 = 30◦ is given in Figure 2. Now consider generating the full set ofstochastic sequences yj(t) for j = 1, . . . , 5. Also included in this set should bethe sequences y0(t) and y6(t) generated by the first-order equations

(13) (I − L)y0(t) = ε0(t) and (I + L)y6(t) = ε6(t).

These sequences, which resemble trigonometrical functions, will be harmoni-cally related in the manner of the trigonometrical functions comprised by equa-tion (3) which also provides a model for a seasonal time series. It follows thata good representation of a seasonal economic time series can be obtained bytaking a weighted combination of the stochastic sequences.

For simplicity, imagine that the white-noise sequences εj(t); j = 0, . . . , 6are mutually independent and that their variances can take a variety of values.Then the sum of the stochastic sequences will be given by

(14)

y(t) =6∑j=0

yj(t)

=ε0(t)

I − L +5∑j=1

εj(t)

I − 2 cos(ωj)L+ L2+

ε6(t)

I − L.

4


0 5 10 15 20 25

0−5 −10 −15 −20 −25

0 20 40 60 80

Figure 2. The graph of 84 observations on a simulated series

generated by the AR(2) process (1− 1.732L+ L2)y(t) = ε(t).

The terms on the RHS of this expression can be combined. Their commondenominator is simply the operator ∇12 = I − L12. The numerator is a sumof 7 mutually independent moving-average process, each with an order of 10or 11. This also amounts to an MA(11) process which can be denoted byη(t) = θ(L)ε(t). Thus the combination of the harmonically related unit-rootAR(2) processes gives rise to a seasonal process in the form of

(15)y(t) =

θ(L)

I − L12ε(t) or, equivalently,

∇12y(t) = θ(L)ε(t).

The equation of this model is contained within the portmanteau equation of thegeneral multiplicative model given under (8). However, although it representsa simplification of the general model, it still contains a number of parameterswhich is liable to prove excessive. A typical model, which contain only a fewparameter, is the ARIMA (0, 1, 1)×(0, 1, 1) model which Box and Jenkins fittedto the logarithms of the AIRPASS data. The AIRPASS model takes the form of

(16) (I − L12)(I − L)y(t) = (1− θL12)(1− µL)ε(t).

Notice how the unit-root autoregressive operators I−L12 and I−L are coupledwith the moving-average operators I − θL12 and I − µL respectively. Theseserve to enhance the regularity of the stochastic cycles and to smooth the trend.

5


Forecasting with Unit-Root Seasonal Models

Although their appearances are superficially similar, the seasonal economicseries and the series generated by equations such as (16) are, fundamentally,of very different natures. In the case of the series generated by a unit-rootstochastic difference equation, there is no bound, in the long run, on the ampli-tude of the cycles. Also there is a tendency for the phases of the cycles to driftwithout limit. If the latter were a feature of the monthly time series of con-sumer expenditures, for example, then we could not expect the annual boomin sales to occur at a definite time of the year. In fact, it occurs invariably atChristmas time.

The advantage of unit-root seasonal models does not lie in the realism withwhich they describe the processes which generate the economic data series.For that purpose the trigonometrical model seems more appropriate. Theiradvantage lies, instead, in their ability to forecast the seasonal series.

The simplest of the seasonal unit-root models is the one which is specifiedby equation (2). This is a twelfth-order difference equation with a white-noiseforcing function. In generating forecasts from the model, we need only replacethe elements of ε(t) which lie in the future by their zero-valued expectations.Then the forecasts may be obtained iteratively from a homogeneous differenceequation in which the initial conditions are simply the values of y(t) observedover the preceding twelve months. In effect, we observe the most recent annualcycle and we extrapolate its form exactly year-in year-out into the indefinitefuture.

A somewhat different forecasting rule is associated with the model definedby the equation

(17) (I − L12)y(t) = (1− θL12)ε(t)

This equation is analogous to the simple IMA(1, 1) equation in the form of

(18) (I − L)y(t) = (1− θL)ε(t)

which was considered at the beginning of the course. The later equation wasobtained by combining a first-order random walk with a white-noise error ofobservation. The two equations, whose combination gives rise to (18), are

(19)ξ(t) = ξ(t− 1) + ν(t),

y(t) = ξ(t) + η(t),

wherein ν(t) and η(t) are generated by two mutually independent white-noiseprocesses.

6


0

5

0

−5

−10

−15

0 20 40 60

Figure 3. The sample trajectory and the forecast function of

the nonstationary 12th-order process y(t) = y(t− 12) + ε(t).

Equation (17), which represents the seasonal model which was used by Boxand Jenkins, is generated by combining the following the equations which areanalogous to these under (19):

(20)ξ(t) = ξ(t− 12) + ν(t),

y(t) = ξ(t) + η(t).

Here ν(t) and η(t) continue to represent a pair of independent white-noiseprocesses.

The procedure for forecasting the IMA model consisted of extrapolatinginto the indefinite future a constant value yt+1|t which represents the one-step-ahead forecast made at time t. The forecast itself was obtained froma geometrically-weighted combination of all past values of the sequence y(t)which represent erroneous observations on the random-walk process ξ(t). Theforecasts for the seasonal model of (17) are obtained by extrapolating a so-calledannual reference cycle into the future so that it applies in every successiveyear. The reference cycle is constructed by taking a geometrically weightedcombination of all past annual cycles. The analogy with the IMA model isperfect!

It is interesting to compare the forecast function of a stochastic unit-rootseasonal model of (17) with the forecast function of the corresponding trigono-metrical model represented by (3). In the latter case, the forecast function

7


depends upon a reference cycle which is the average of all of the annual cycleswhich are represented by the data set from which the regression parametershave been computed. The stochastic model seems to have the advantage that,in forming its average of previous annual cycles, it gives more weight to recentyears. However, it is not difficult to contrive a regression model which has thesame feature.

8

Documents

A Short Course of Time-Series Analysis and Forecasting by D S G Pollock