20
Journal of Econometrics 6 (1977) 289-308. 0 North-Holland Publishing Company DIFFERENCING OF RANDOM WALKS AND NEAR RANDOM WALKS* Nicholas J. GONEDES and Harry V. ROBERTS Graduate School of Business, University of Chicago, Chicago, IL 60637, USA Received July 1975, final version received April 1977 The traditional rationale for differencing time series data is to attain stationarity. For a nearly non-stationary first-order autoregressive process-AR (1) with positive slope para- meter near unity-we were led to a complementary rationale. If one suspects near non-station- arity of the AR (1) process, if the sample size is ‘small’ or ‘moderate’, and if good one-step- ahead prediction performance is the goal, then it is wise to difference the data and treat the differences as observations on a stationary AR (1) process. Estimation by Ordinary Least Squares then appears to be at least as satisfactory as nonlinear least squares. Use of differ- encing for an already stationary process can be motivated by Bayesian concepts: differencing can be viewed as an easy way to incorporate non-diffuse prior judgement -that the process is nearly non-stationary - into one’s analysis. Random walks and near random walks are often encountered in economics. Unless one’s sample size is large, the same statistical analyses apply to either. 1. Introduction and overview Time series data with strong statistical resemblances to random walks frequently arise in economics and business: for example, logs of stock prices, logs of the annual real gross national product, logs of the annual sales of many companies, and the annual velocity of money. An alternative model for such data is the statonary first-order autoregressive process [AR(l)] with slope para- meter close to + 1, which is the upper boundary of stationarity; observations on such a process can closely resemble observations on a random walk process. From both substantive and statistical points of view, it may seem appropriate to use the stationary AR(l) model as a basis for statistical analysis. Unfortun- ately, an attempt to use this model to analyze data exhibiting random-walk-like properties will encounter serious and well-known estimation probems, even if the *This paper incorporates material from our earlier paper ‘Autoregression versus ordin- ary regression: Lesions for inference’, Report 7531, Center for Mathematical Studies in Business and Economics, University of Chicago. July 1975. The authors are grateful for the -, _ comments and suggestions received from many colleagues. At the risk of serious omissions, we wish to acknowledge explicitly the help of Eugene Fama, John Gould, Dennis Lindley, Robert Ling, Albert Madansky, Charles Nelson, S.J. Park, William Wecker, Arnold Zellner and the participants in the Econometrics Colloquium, University of Chicago, and the NBER- NSF Seminar on Bayesian Inference in Econometrics, Chicago, May 1975.

Differencing of random walks and near random walks

Embed Size (px)

Citation preview

Page 1: Differencing of random walks and near random walks

Journal of Econometrics 6 (1977) 289-308. 0 North-Holland Publishing Company

DIFFERENCING OF RANDOM WALKS AND NEAR RANDOM WALKS*

Nicholas J. GONEDES and Harry V. ROBERTS

Graduate School of Business, University of Chicago, Chicago, IL 60637, USA

Received July 1975, final version received April 1977

The traditional rationale for differencing time series data is to attain stationarity. For a nearly non-stationary first-order autoregressive process-AR (1) with positive slope para- meter near unity-we were led to a complementary rationale. If one suspects near non-station- arity of the AR (1) process, if the sample size is ‘small’ or ‘moderate’, and if good one-step- ahead prediction performance is the goal, then it is wise to difference the data and treat the differences as observations on a stationary AR (1) process. Estimation by Ordinary Least Squares then appears to be at least as satisfactory as nonlinear least squares. Use of differ- encing for an already stationary process can be motivated by Bayesian concepts: differencing can be viewed as an easy way to incorporate non-diffuse prior judgement -that the process is nearly non-stationary - into one’s analysis. Random walks and near random walks are often encountered in economics. Unless one’s sample size is large, the same statistical analyses apply to either.

1. Introduction and overview

Time series data with strong statistical resemblances to random walks

frequently arise in economics and business: for example, logs of stock prices,

logs of the annual real gross national product, logs of the annual sales of many

companies, and the annual velocity of money. An alternative model for such data is the statonary first-order autoregressive process [AR(l)] with slope para- meter close to + 1, which is the upper boundary of stationarity; observations on such a process can closely resemble observations on a random walk process. From both substantive and statistical points of view, it may seem appropriate to use the stationary AR(l) model as a basis for statistical analysis. Unfortun- ately, an attempt to use this model to analyze data exhibiting random-walk-like properties will encounter serious and well-known estimation probems, even if the

*This paper incorporates material from our earlier paper ‘Autoregression versus ordin- ary regression: Lesions for inference’, Report 7531, Center for Mathematical Studies in Business and Economics, University of Chicago. July 1975. The authors are grateful for the -, _ comments and suggestions received from many colleagues. At the risk of serious omissions, we wish to acknowledge explicitly the help of Eugene Fama, John Gould, Dennis Lindley, Robert Ling, Albert Madansky, Charles Nelson, S.J. Park, William Wecker, Arnold Zellner and the participants in the Econometrics Colloquium, University of Chicago, and the NBER- NSF Seminar on Bayesian Inference in Econometrics, Chicago, May 1975.

Page 2: Differencing of random walks and near random walks

290 N. J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks

‘true’ process is AR (1) stationary. Further, the data from samples of sizes common in economic research will provide little guidance in distinguishing between stationary AR(l) and random walk.

The random walk (RW) model specifies that at time t,

9t = PO+_%1+&P (1)

where .Et is an independent, identically-distributed disturbance with zero mean. If PO = 0, the model is called a random walk without drift. It is usually specified that 8, is normally distributed. The stationary first-order autoregressive model [AR(l)] is defined by

Bt = Po+Plvt-1+E”t, (2)

with I PI I < 1. The random walk model of (1) is a non-stationary variant for which & = 1.

When the stationary AR( 1) model holds and & is close to + 1, the observations on jt in finite samples will resemble observations on a random walk; sequence plots and sample autocorrelations, for example, will be similar. But the stationary AR(l) model, even in these instances, has some tendency towards mean rever- sion. The expected distance of the next observation from the process mean is (1 - &) times the current distance.

Unfortunately, any attempt to apply the stationary AR(l) model to a moder- ate-sized sample using the specification of a diffuse prior distribution (or using standard sampling-theory methods) will encounter the full force of well-known inferential problems. These problems have been studied extensively by statis- ticians following the sampling-theory tradition; see, for example, Copas (1966), Hurwicz (1950), Kendall (1954), Dixon (1944), Marriott and Pope (1954), Orcutt and Winokur (1969, Orcutt (1948), Cochrane and Orcutt (1949), Orcutt and Cochrane (1949), Shenton and Johnson (1965), and White (1961), among others. The problem posed by the sampling bias of the ordinary least squares (OLS) slope estimator b”, of /I1 has attracted most attention. For small sample size n and /I1 close to + 1, the case of greatest interest for us, the bias is negative and substantial. An approximate expression for the bias (to the order n-l) is

E(b”, I&>-& = -;(I +3P,); (3)

see Marriott and Pope (1954, (4.07)). If, for example, n = 30 and & = 0.92, the bias is a sobering -0.125. Attempts by Orcutt and Winokur (1969) and

Page 3: Differencing of random walks and near random walks

N.J. Gonedes and H. V. Roberts, Differencing of random walks andnear random walks 291

Copas (1966), among others, to correct for the OLS bias were not uniformly successful.’

The stationary AR(l) model also poses a starting-value problem for sampling theory: ordinary least squares estimators must be interpreted conditionally on the first observation jjl = y,. This problem is discussed by Box and Jenkins (1970, ch. 7), where a general nonlinear estimation procedure invoving back- ward forecasting is proposed as a remedy. It is also discussed by Zellner (1971, sec. 7.1).

The estimation problems surrounding stationary AR(l) with strongly positive & may frustrate attempts to compare it with the random walk model in analysis of apparent random walk behavior. Since important theoretical issues can turn on the adoption of one or the other model and since the problem of data analysis for predictive purposes is important in its own right, we were led to further study of the inferential problems posed by the stationary AR(l) model.

2. Estimation of the stationary AR(l) model

Our first thought was that the poor estimation performance of OLS applied to the stationary AR(l) model with & close to + 1 and II moderate or small was traceable to the loss of information potentially obtainable from the first sample observation y, . This information can be captured by appropriate treat- ment of the relevant likelihood function. We shall briefly review some facts bearing on this issue. We shall view the time series as a single n-dimensional observation and specify that it is an observation from an n-dimensional normal distribution. Let yT = [vl, yz , . . . , y,] be an n-dimensional vector each component, yt, of which is a realization of j, as defined by the stationary AR(l) process in (2). Then j is an n-dimensional normal random vector with an n-dimensional

mean vector or = [pl, p2, . . . , p,], where p is the mean of the AR(l) process, and with an’n x n dimensional positive definite symmetric covariance matrix z.

For the stationary AR(l) process, the mean vector p and covariance matrix z can be readily expressed in terms of the parameters /Jo, PI, and o2 = var (I,) of expression (2); see, for example, Zellner (1971, p. 188). For every t, the marginal distribution of g, is normal with mean ,U = PO/(1 -&) and variance

o; = ~f/(l -j?:). The marginal covariance of B, and j, is ~‘:-s1a~(1-j3~)-‘. Hence the covariance matrix ,Y5 is

(4)

‘The special case of the stationary AR (1) model with Do = 0 was examined by Thornber (1967), who found favourable properties of Bayesian estimators relative to sampling-theory estimators. Related work was done by Copas (1966), who also provided results on the starting value problem discussed below.

Page 4: Differencing of random walks and near random walks

292 N.J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks

The determinant, 1 Z 1, of Z is a,‘“/(1 - /3:), and the inverse is

I

-fB, 1;:: 0 0

-“p, 0” 1:: 0 0 z-1 = c-2 0 -fir 1+p: 0 . . . 0 0

. . . . . . . . . . . . . . . . . . . . . 0 0 0 0 . . . -pi 1

see Graybill (1969, p. 182) or Press (1972, p. 24).

(5)

The likelihood function for the parameters p and C given the realization j = y is, up to an arbitrary positive multiple,

4a, zlv> x I~I-* exp[-S(y-p)‘~-l~-p)l. (6)

If the explicit expressions for the inverse and determinant of the covariance matrix from (4) and (5) are substituted into (6), the likelihood function in terms of p, &, and 0,” is

x exp I

-& Kl -P?)(v, -LV E

x 0-(“-l)exp e {

-$ $ [v,~-B,(~~-~-i’)ll}.

Note that the process mean p rather than the intercept Jo is used to parameterize (7). In this parameterization, the stationary AR(l) process is represented by the scheme

Note that yl, the first sample observation, appears as the first yt in both (6) and (7). There is no need to introduce a fictitious observation, yO, at time 0 that has to be treated as an unknown parameter. Nor is there a need to reflect explicitly all values q, go, .5_1, e_2, . . . , extending into the indefinite past,

Page 5: Differencing of random walks and near random walks

N.J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks 293

as do Box and Jenkins (1970, app. A7.4, p. 273) in their formulation of the

general likelihood function for any autoregressive moving average (ARMA) model. Their formulation suggests the estimation method of ‘backward fore- casting’, where one approximates a finite number of the Q’S from .sl extending backward into the past; see Box and Jenkins (1970, ch. 7). Our formulation, which is given by Box and Jenkins in their Appendix Al.5, p. 276, gives the exact likelihood function for CL, pl, and of as a function of the observed sample vector y. In addition to conceptual simplicity, there may be a computational advantage in treating the first-order autoregressive process as a special case and working with the exact likelihood function, rather than working with approximation methods that cope with the starting value problem of the general ARMA model.

The factorization of the likelihood function in the second form of (7) is of

interest. The first factor,

is of special interest because of the role of the first observation y1 in our attempt to isolate the estimation problems of OLS. Since this expression is proportional to a univariate normal density with mean p and standard deviation cr,/(l -p:)“, we can conclude from the recognition method [Antelman and Roberts (1971)] that this normal density is the sampling density of the first observation jjl

given ~1, &, and gE. Thus, we can think of the first observation as a drawing from the marginal distribution of the autoregressive process, which has mean p and variance a,‘/(1 -/3!). Intuitively, we can think of y1 as providing informa-

tion about the general level p of the process, although it also contributes information about the parameters p1 and c,” .

The second factor in the second form of the likelihood function is proportional to the familiar sampling density of the n- 1 observations in the regression of jjt on yt-i ; in this sense autoregression parallels ordinary regression.

Now we adopt an explicitly Bayesian view and turn to prior densities. For the prior marginal density of ii and Ce, we shall work with the conventional improper, jointly diffuse density

P(P, 0) K q’, -co <p< co, B,>O. (10)

We shall further specify that, independently of ,C and 5:,, the prior density of /Y, is

P(B1) K (1 -&)-+(l +/G>-+, 0 5 l&l < 1. (11)

This density is suggested by an approximate application of Jeffreys’ invariance theory [see Zellner (1971, p. 190 )]. As can be seen by the substitution w =

Page 6: Differencing of random walks and near random walks

294 N.J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks

(1+&)/2, which leads to the standardized beta density proportional to kw-*(l --IV)-* for k > 0 and 0 < w -C 1, it is a proper beta density; it is u-shaped with a minimum at /?r = 0, and it increases without bounds as PI approaches either - 1 or + 1.

The joint prior density for all three parameters of the model is therefore the product of (10) and (II), which is the improper density

P(P, PI 3 cc> cc (1 -A>-+(1 +Pl>-“c~ (12)

with ,iI, j,, and Ze defined over the ranges noted for (10) and (11).

By Bayes’ theorem the joint posterior density of p, fir, and ZC is proportional to the product of the joint prior density (12) and the likelihood function (7),

+i ~Yt-P--P1b-1-P)12 * 1=2 11 (13)

Note that the factors involving PI in the joint prior density (12) cancel the lead factor involving PI in the likelihood function (7).

The mode of the joint posterior density, (13), of fi, fir, and ZZ will be used as a point estimator of the stationary AR(l) model’s parameters. For I& 1 < 1, maximization of (13) by elementary calculus yields necessary conditions given by the following simultaneous equations in p and PI :

b- l)(Y,-P1Y,)+(l +P1h

p = (n-1>(1-B1)+(1+&> (14)

p

1

= $ (vt-Yz)(Yt-1-y,)+(n-l)(~,--CL)(Y,--)

$ (v,-I-Y1)2+(~-l)(YI-P)2-(YI-P)2 ’

(15)

where y1 is the sample mean of the first n- 1 observations and y2 is the sample mean of the last IZ- 1 observations. Solution of these equations for ~1 and PI, when admissible values of PI result, yields the estimates /I and 8, that we shall henceforth call Modal. Except for problems of numerical evaluation, these estimates should be the same as those yielded by the general nonlinear least squares estimation method, with backward forecasting, recommended by Box and Jenkins (1970). The Modal estimate 5: corresponding to these estimates is

i [(y,-Ci)-81(yt-1-ii>12+(1-~~)(~1-rCi) 2 -Cl@ 2

Page 7: Differencing of random walks and near random walks

N. J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks 295

Note that the first observation, yr, is reflected in all of these expressions, contrary to the counterpart expressions for OLS estimators.

If the method of maximum likelihood is applied to the second factor on the second line of (7) one obtains the usual OLS estimates of /I,, , PI, and c,” .

3. Simulations: QLS and modal estimation and prediction

The objective of this section is to report empirical sampling distributions of OLS and Modal estimators and of predictions derived from them for new data. All results are based on the specification that the simulated data are drawings from the stationary AR(l) process defined by (2).

For purposes of comparison, we also provide prediction results for two other models: (1) the strict random walk model without drift, which can be interpreted as the non-stationary AR(l) model defined by PO = 0 and & = + 1, and (2) the stationary AR(l) model applied to the first differences, djj, = J,-J,-r. Strictly speaking, these two models are inconsistent with the stationary AR(l) model used to generate our simulated data. Under appropriate conditions, however, these alternative models are not too far removed from the model that generated our data.

The strict random walk model without drift requires no estimation from the data; the prediction scheme is simply to use the current observation as a point estimate of the next. The stationary AR(l) model applied to first differences must be estimated from the sample evidence. For this purpose we used both OLS and Modal estimation. In referring to prediction results for the differenced data, we shall describe these procedures as OLS-d and Modal-d. For all models and estimation methods, each forecast is a forecast of the level of Jt for some t.

Each replication of our simulation entails the generation of a sample from a stationary AR(l) process. Parameterized in terms of p, this process is expressed by (8), repeated here for convenience,

(17)

In all simulations the variance of E”, was set equal to unity and the process mean fi was set equal to zero, which implies that the intercept PO of (2) is also zero, since PO = ~(l-&).

Our aim was to represent performance of estimation procedures for small and moderate sample sizes and for parameter values of p1 near + 1, but also extending downward to 0 for comparison. Thus we show results for seven values of & - 0.99, 0.95, 0.90, 0.70, 0.50, 0.20, and 0 - for each of three sample sizes, n = 20, 30, and 60. All empirical distributions summarized below are based on 50 replications for each combination of p1 and n. All computations

C

Page 8: Differencing of random walks and near random walks

296 N.J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks

were performed on a Hewlett-Packard 2000F Time Sharing System at the Graduate School of Business, the University of Chicago. Details of our simu- lation procedure are given in the appendix.

3.1. Estimation results

We shall discuss only the results for /II, the parameter of central interest. The most salient findings are shown in table 3. Tables containing more detailed results are available from the authors. Results are available for p, PO, PI, and cE.

The OLS means for PI are in good agreement with the approximate bias expression given in section 1: -n-1(1+3/?,). The Modal estimates displayed consistently smaller estimated bias than did OLS, except for PI = 0, where the two are about the same. The superiority of the Modal estimates over OLS increases as & increases from 0 toward 1, but the Modal estimates nonetheless display substantial bias: for & = 0.99 and n = 20, for example, the mean of the Modal estimates was only 0.8286. Hence it appears that Modal estimation moderates but by no means eliminates the nasty estimation problems inherent in OLS.

Table 1

Mean and mean squared error of sample estimates of slope bl, based on 50 (occasionally fewer) replications at various combinations of /I1 and n, OLS versus Modal estimates.

Mean Mean squared error

A Estimate n = 20 n = 30 n = 60 n = 20 n = 30 n = 60

0.99 OLS 0.7865 0.7641 0.8915 0.0676 0.0758 0.0144 Modal 0.8286 0.8117 0.9200 0.0508 0.0554 0.0088

0.95 OLS 0.7536 0.7676 0.8736 0.0659 0.0609 0.0112 Modal 0.8076 0.8031 0.8966 0.0448 0.0494 0.0079

0.90 OLS 0.7142 0.7250 0.8239 0.0596 0.0589 0.0116 Modal 0.7627 0.7592 0.8436 0.0443 0.0503 0.0093

0.70 OLS 0.5521 0.5591 0.6345 0.0638 0.0516 0.0126 Modal 0.5851 0.5820 0.6474 0.0620 0.0480 0.0119

0.50 OLS 0.3766 0.3888 0.4441 0.0661 0.0469 0.0137 Modal 0.4005 0.4024 0.4522 0.0693 0.0458 0.0131

0.20 OLS 0.1187 0.1270 0.1571 0.0596 0.0387 0.0143 Modal 0.1298 0.1298 0.1591 0.0654 0.0402 0.0143

0.00 OLS - 0.0542 -0.0496 -0.0321 0.0538 0.0341 0.0138 Modal -0.0528 -0.0528 -0.0331 0.0596 0.0364 0.0143

For /I1 2 0.70, the mean squared error is always less for the Modal estimates than for OLS, and the margin increases as PI gets closer to + 1. For smaller

Page 9: Differencing of random walks and near random walks

N. .I. Gonedes and H. V. Roberts, Differencing of random walks and near randotn walks 297

values of PI, the two methods differ little in mean squared error, but there appears to be a slight edge for OLS.’

3.2. Prediction results

We now turn to simulation results for one-step-ahead predictions. We shall concentrate on the empirical means of sample mean squared prediction errors over the predictive samples of size 20, of which there are usually 50 for each empirical mean. These empirical means are summarized in table 2.3

Modal estimation shows modestly lower MSE’s than does OLS for the larger PI’s and smaller n’s, but the margin narrows rapidly as /?r is reduced or n is increased. From PI of 0.50 down to 0, the results are virtually the same for the two estimators. This provides support for our conjecture that exploitation of the information contained in the first observation would alleviate some of the estimation problems discussed in section 1. But the advantage of Modal esti- mation over OLS is overshadowed by the performance of other procedures, beginning with the random walk (RW) model.

From table 2 it is seen that for & from 0.90 to 0.99 and for all sample sizes, the mean MSE for RW is substantially lower than for either OLS or Modal estimation. Even at PI = 0.70, RW shows a lower mean MSE for n = 20 and n = 30. Only for & 5 0.50 does RW fall behind at all three sample sizes; it is of course very bad at /I1 = 0.4

Thus an analysis based on an incorrect model, RW, yields better predictive results over a surprising range of PI’s than those achieved by Modal or OLS estimation directed toward the correct model of the simulation, stationary AR(l). Further, the sample sizes for which this happens are realistic for many serious applications of econometrics. For example, at PI = 0.90 and n = 30, RW has

‘In table 1 all MSE’s of parameter estimates were computed indirectly as the sum of the squared bias and squared standard deviation, the latter based on the divisor n - 1. In results for predictions of future observations, discussed subsequently, MSE’s were computed directly.

Standard deviations are not displayed explicitly, but these do not differ as substantially as do some of the means and mean squared errors. The Modal estimate has a lower standard deviation for j& = 0.95 and b1 = 0.99, but for lower values there is, on balance, a slight edge for OLS.

The empirical distributions for both OLS and Modal estimates depart sharply from normality for the largest values of bl, with a preponderance of substantially negative skewness and substantially positive kurtosis coefficients. These departures, which were also quite visible in normal probability plots made during our study, are less noticeable for larger n’s and smaller Br’s.

3The empirical distributions of individual MSE’s are skewed to the right for all n and bl, both for OLS and Modal predictions. The skewness is most extreme for small n and large bl.

%ince RW requires no statistical estimation, and since the 20 randomly generated a,‘~ are the same for each of the three n’s at each /II, the RW means should in principle be the same on each line of the table. Actually, some slight differences occur for two reasons: (1) the starting value for each predictive sample is different for the three n’s, even though the 20 Q’S are the same; (2) some replications were deleted because of inadmissible estimates of 81, as explained in the appendix.

Page 10: Differencing of random walks and near random walks

298 N.J. Gonedes and H. V. Roberts, Diflerencing of random walks and near random walks

Table 2

Mean of mean squared prediction errors for 20 new observations, based on 50 (occasionally fewer) replications at various combinations of B1 and n,

for OLS, Modal, and RW.

PI Procedure n = 20 n = 30 n = 60

0.99 OLS Modal RW

0.95 OLS Modal RW

0.90 OLS Modal RW

0.70 OLS Modal RW

0.50 OLS Modal RW

0.20 OLS Modal RW

0.00 OLS Modal RW

1.9785 1.7534 1.0402

2.1737 1.6880 1.0484

1.8256 1.5891 1.0888

1.3743 1.3431 1.2269

1.2872 1.1825 1.1113 1.2825 1.1839 1.1112

2.2261 1.2837 1.8714 1.2034 1.0807 1.0688

1.7816 1.2422 1.6182 1.2007 1.0891 1.0855

1.5349 1.1942 1.4543 1.1719 1.1126 1.1036

1.2433 1.1247 1.2343 1.1214 1.2298 1.2269

1.3832 1.3903 1.3890

1.2382 1.1579 1.1059 1.2426 1.1631 1.1072 1.7183 1.7231 1.7232

1.2245 1.1544 1.1035 1.2297 1.1604 1.1050 2.0553 2.0553 2.0553

a mean MSE of 1 .1126 as opposed to 1.4543 and 1 S349 for the Modal and OLS procedures, respectively. Yet /?I = 0.90 represents a substantial degree of mean reversion and y1 = 30 represents the number of years usually available for annual time series since World War II.

We also studied predictions based on the AR(l) model applied to first differences. Like the RW model, this model is incorrect for our simulations; unlike the RW model, it requires estimation of parameters from the data. Table 3 is the counterpart of table 2 when OLS and Modal estimation are applied to the differences dj,. For all /?I and all 12, the differences between OLS-d and Modal-d are so small that we shall simply discuss OLS-d.’

For the applications of interest to us, the results for & 2 0.90 are especially important. In this range OLS-d very substantially outperforms OLS and Modal estimates and is only slightly poorer than RW. Even at j$ = 0.70, OLS-d

‘The mean MSE’s of predictions for RW are essentially the same as in table 2, the slight differences being attributable to the fact that all 50 replications were usable for every com- bination of PI and n when estimation was based on differences, whereas there were deletions, explained in the appendix, in the earlier simulations based on levels.

Page 11: Differencing of random walks and near random walks

N. J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks 299

Table 3

Mean of mean squared prediction errors for 20 new observations, based on 50 replications at various combinations of PI and n, for OLS-A, Modal-A,

and RW.

/31 Procedure n = 20 n = 30 n = 60

0.99 OLS-A 1.2344 1.1647 1.1093 Modal-A 1.2455 1.1631 1.1108 RW 1.0868 1.0702 1.0728

0.95 OLS-A 1.2524 1.1697 1.1105 Modal-A 1.2604 1.1692 1.1118 RW 1.0829 1.0832 1.0855

0.90 OLS-A 1.2539 1.1794 1.1198 Modal-A 1.2597 1.1800 1.1209 RW 1.1065 1.1038 1.1036

0.70 OLS-A 1.3109 1.2718 1.2174 Modal-A 1.3139 1.2734 1.2184 RW 1.2269 1.2298 1.2269

0.50 OLS-A 1.4050 1.3763 1.3281 Modal-A 1.4065 1.3780 1.3293 RW 1.3832 1.3903 1.3891

0.20 OLS-A 1.5558 1.5248 1.4888 Modal-A 1.5570 1.5268 1.4902 RW 1.7183 1.7231 1.7232

0.00 OLS-A 1.6557 1.6171 1.5901 Modal-A 1.6575 1.6193 1.5917 RW 2.0553 2.0553 2.0553

outperforms OLS and Modal estimates for n = 20 and is only slightly worse at IZ = 30. Of course, the performance of OLS-d deteriorates as & decreases from 0.70 to 0.

The predictive performance of OLS-d (and, equally, of Modal-d), and that of RW, overshadows the results for Modal estimation applied to the original data. If an analyst has strong prior evidence that he is dealing with a stationary AR(l) process with D1 2 0.90, or perhaps even pi 2 0.70, and if his sample size is within the range studied here, he would be better off (at least as far as one-step predictions are concerned) to apply OLS to the differences rather than to follow the apparently orthodox procedure of analyzing the stationary AR(l) directly by OLS or Modal estimation.

The apparent unorthodoxy of this advice disappears under more careful scrutiny. The RW model can be regarded as a rough device for approximating the result of a Bayesian analysis that places high prior probability on values of & close to + 1. Although it is less obvious, we shall show in section 5 how OLS-d and Modal-d can be regarded in the same light, although they of course permit and require statistical fitting.

Page 12: Differencing of random walks and near random walks

300 N. J. Gonedes and H. V. Roberts, Differencing of random walks andnear random walks

The traditional rationale for differencing time series has been to transform a non-stationary process into a stationary one. Our results point to a different though complementary, role. They suggest that differencing data from a stationary AR(l) process and analyzing the differences as if the original data were drawings from a non-stationary process can be wise when near non- stationarity of the original process is strongly suspected. Unlike the traditional rationale, our rationale is applicable only when the sample size is not so large as to permit good performance of standard estimation techniques on the

original, undifferenced data. The results point to another useful consequence of differencing. When

differencing is applied to a nearly nonstationary AR(l) process, OLS estimators and their associated statistical error formulas may be entirely satisfactory for practical estimation.6

4. Why standard procedures do poorly in the neal non-stationary case

Suppose that sufficient a priori information were available to justify the assessment that the process mean ~1 could be regarded as a known number, say /J = 0. Then the least squares estimate of /?r is

h = $Y,-,y,l $'Y:.

If we approximate the sampling expectation of b”, by the ratio of the expectation of the numerator to that of the denominator, we find that

Eb” (n- l)&i(l -Pf>

1 M (n-l)/(l-/?f) = ‘r’

To this order of approximation, the bias problem disappears. We therefore conjectured that the estimation difficulties documented in earlier sections would largely disappear if p could be treated as known.

The essential nature of the problem can be illuminated by a single simulation. Consider AR(l) with p = 0, /3, = 0.90, and unit variance of the disturbance 8.

The marginal standard deviation of this process is l/(1 -/?f)f = 2.294. We started out the simulation at y1 = 5, slightly more than two standard deviations

60ne referee has reminded us that a first-order moving average model applied to the differ- ences should, in the circumstances of interest, be about equally effective. Although we have not explored this suggestion, we are inclined to agree with the conclusion, since any process whose sole departure from random behaviour is reflected in a modest fist-order autocorrel- ation coefficient (say -0.2 to 0.2), can ordinarily be equally well fitted by an AR(l) or an MA (1) model. The differences d_Pt resulting from near-random walk series Jt are typically of this description.

Page 13: Differencing of random walks and near random walks

N.J. Gonedes and H. V. Roberts, Diferencing of random walks and near random walks 301

above the process mean. For all 375 observations, the sample mean and standard deviation were -0.110 and 2.223, respectively, in close conformity with the process values of 0 and 2.294, respectively. Further, a normal probability plot suggested reasonably close conformity with the normal model. The OLS autoregression gave the equation for fitted values as 9, = -0.014+0.898y,_, , close to the model equation EJ, = 0 +0.9y,_1.

mm STD. DEV. VERT. VAR. -.123795 2.21006

XOF’JZ. VAR. -.K?ZlK?

Fig. 1. Simulation of AR(l) with .u = 0, b1 = 0.9, var(E) = 1, y1 = 5 and n = 375; scatter plot for all 374 pairs (JJ~_~, yr).

Hence the empirical joint distribution of the 374 pairs (Y~__~, yt), shown as a standardized scatter plot in fig. 1, gives an idea of the joint distribution of

jjt and Jt_l. The OLS fitted line -0.014+0.898y,_1 is superimposed on this plot.

For the first 20 observations, the corresponding scatter plot is shown in fig. 2. Here the OLS equation isjj, = 0.544 +0.581~,_~. The means and standard deviations of the 19 Y~__~‘s and JJ~‘S, displayed within fig. 2, are far removed from the process values. In particular, the means differ substantially from 0 and the standard deviations are much smaller than 2.294. Although our choice of starting value of y1 = 5 makes the example a little extreme, all these features

Page 14: Differencing of random walks and near random walks

302 N. J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks

Fig. 2. Simulation of AR(l) with .u = 0, /I1 = 0.9, var(E”) = 1, y1 = 5 and n = 375; scatter plot for the first 19 pairs (_yt-l, y,).

are typical of what to expect when PI is close to + 1, as we have found from other simulations. The evidence from these simulations can be explained by the reasoning sketched below. 7

Fig. 2 shows two regression lines besides the OLS fit for all 374 pairs: the OLS fit 0.544+0.581~,_1 for the first 20 observations, and the OLS fit for these same observations with the specification p = 0, which is 0+0.772y,_, . The latter is much closer to the model 0+0.9y,_r, a fact that reflects a general phenomenon that is almost self-evident from a glance at the graphs. That is, for PI = 0.90 and n = 20, the sample variance tends to be much less than that of the AR(l) process. The sample points in fig. 2 trace out a subregion of the scatter plot of fig. 1 for all 374 points. The best-fitting regression line is typically much flatter in this subregion, as is indicated by expression (3) in section 1 for the least-squares bias in slope estimation.

The curtailment of sample variance can be seen by examination of the expec- tation of the sample variance S2 = C(jj,-T)2/(n- 1) when the variance of the

‘A detailed analytical treatment appears in the previous version of the paper, which is available from the authors.

Page 15: Differencing of random walks and near random walks

N.J. Gonedes and H. V. Roberts, Differencing of random walks andnear random walks 303

disturbance Et is unity,

ES2 = 1-p; --I-j”-l-[~{l-/?~-~ 1 1-p; _( --n/$-l

--II l-/?I >I11 (n-1)-1. (18)

For n = 20 and & = 0, (18) gives Es”’ = 1, the familiar result that the expect-

ation of the sample variance is equal to that of the process in simple random sampling. For n = 20 and & = 0.90, the process variance is l/(1 -PI) = 5.263; (18) shows that the expectation of the sample variance is only 2.467, less than half that of the process.

To confirm the importance of precise knowledge about ,u in making inferences about AR(l) processes, we reran the simulations of section 3 setting all sample means equal to the true mean ,u = 0. The results are given in tables available from the authors. These results are consistent with the conjecture that the major difficulties in estimating the stationary AR(l) model with & near unity turn on uncertainty about ,u. They also confirm conjectures made many years ago by Orcutt as to the ‘likely’ sources of bias in estimation of serial correla- tion coefficients; see, for example, the discussion in Orcutt (1948, sec. V).

5. Why does differencing work well ?

When the general linear regression model with normal, homoskedastic dis- turbances is appropriate in an application, the statistician has everything going for him. Theoretical desiderata such as the existence of similar regions and minimal sufficient statistics make for well-behaved inferences regardless of true parameter values (or of the statistician’s views on the foundations of statistics). An approximation to this happy state of affairs obtains for the AR(l) process even when IZ is small if & is, say, within the interval (0, 0.5), as is seen by the simulation results of section 3. When & is close to + 1, however, much larger n’s or precise knowledge of the process mean are needed for satisfactory inferences, but in many applications neither is available. In these applications the behavior of parameter estimates and of one-step-ahead predictions, whether made by OLS or Modal estimation, does not conform to what we expect in well-behaved applications. Bias and mean square error or one-step-ahead predictions are actually much worse than for RW, OLS-d, and Modal-d, all of which are based on the wrong model.

Our results can be better understood by an appeal to Bayesian concepts. In the well-behaved applications of the linear model, a jointly diffuse prior distri- bution will lead to a posteror distribution that can be thought of intuitively as the mirror image in the parameter space of the sampling distribution of

Page 16: Differencing of random walks and near random walks

304 N. J. Gonedes and H. V. Roberts, Diflerencing of random walks and near random walks

sufficient statistics in the sample space. Thus, in the simple model of a single normal mean with random sampling, the posterior probability that the process mean is above the posterior mean is 0.5 if the prior distribution is diffuse. But few staunch Bayesians, confronted by a ‘near random walk’, would assess the probability at 0.5 that 8, was less than a Modal estimate computed from a sample of 20. If n = 20, it is little consolation to realize that the likelihood function would eventually dominate the diffuse joint prior distribution described in section 2 as n increases.

The fully Bayesian approach would require the assessment of a joint prior distribution for fi, /?, , and g,” that would be non-diffuse at least for the marginal distribution of 8,. Differencing, however, can be regarded as a way to achieve much the same result. We now explain why this is so.

If jj, evolves from a stationary AR(l) process, then LIT, evolves according to the integrated autoregressive-moving average model ARIMA (1, 1, 1) with a process mean of zero and moving average parameter of unity. In practice, however, statisticians differencing data from a nearly nonstationary AR(l) process are likely to contemplate the differenced series as a sample from a stationary AR(l) process or perhaps a pure moving average process, MA(l). We shall consider the AR(l) model for the differences.

Suppose then that the process for jj, is approximated by the AR(l) model on

4,>

Aj, = &,+P;Ay,_, +S;, IlGI < 1, (19)

where fib may or may not be constrained to equal zero. In terms of jjt this model can be written

Y, = Pb+(l+B;lu*-1-P;v,-2+~;. (20)

The model in (20) defines an AR(2) process that is not stationary because the slope coefficients are constrained to sum to unity: (1 -t/J;) + (- /3;) = 1. The price of differencing is the loss of any possibility of drawing an inference about the mean of the AR(l) stationary model that is here assumed to be correct.

If we write the general AR(2) process as

we can visualize the parameter subspace of /Jr and pZ as in the diagram below [see Box and Jenkins (1970, pp. 5%59)]. The region of stationarity for (&, p2) is the interior of the triangle. The constraint /I1 +pZ = 1 implied by (19) requires that the parameters (/?r, p2) lie on the right-hand boundary of the triangle or its extention, as shown.

Page 17: Differencing of random walks and near random walks

N. J. Gonedes and H. V. Roberts, Dijerencing of random walks and near random walks 305

Fig. 3

The AR(l) stationary model, by contrast, constrains (&, f12) to the interior horizontal segment at height pZ = 0. The intersection of this horizontal segment with the right-hand boundary defines the RW model.

From the viewpoint of AR(2) as in (20), the specification p1 +pa = 1 can be interpreted as a non-diffuse prior density, since all the marginal probability is placed on a one-dimensional subspace, with RW included as the special case p1 = 1, /I2 = 0. From the viewpoint of AR(l) as in (19), however, the same specification can be regarded as consistent with the assessment of a diffuse prior density for fi;. But our simulations suggest that OLS inferences based on AR(l) for dj, will have sampling properties very much like OLS inferences in the standard linear model. The inferences about /I; from the simulations of section 3 typically put most of the posterior probability close to zero if we start from a diffuse prior. Those simulations suggest further that the differenced model so estimated will perform well in prediction for small and moderate n and large

B even though the differenced model is literally disjoint from the stationary AR(l) model for jjt.

By differencing, we alter the model from AR(l) stationary to AR(2) con- strained and non-stationary. At the same time we implement the prior judgment that we are dealing with an application that entails a near random walk and we obtain the computational advantages of working with a diffuse prior distribution in estimation of the substitute model.

If the process is really AR(l) stationary, the advantages of differencing diminish for larger sample sizes. But as sample sizes grow, so do the possi- bilities of structural changes that erode simple models. As Gonedes (1973) has shown, there can come a point at which further nominal expansion of sample size leads to poorer predictive performance of the fitted model when it is tried out on new data. Mence the sample sizes of our simulations, 20 through 60, may represent an important segment of practical econometric work.

The usual reason for differencing in time series work is to attain stationarity. Our work suggests a complementary rationale under which differencing is applied when one suspects a model that is stationary but near the boundary of non-stationarity. The differences can then be analyzed by a model known to be incorrect but that can be expected to give more satisfactory predictive performance.

Page 18: Differencing of random walks and near random walks

306 N. .I. Gonedes and H. V. Roberts, Differencing of random walks and near random walks

Appendix

To start out each sample, we generated a value jjl from the marginal distri-

bution of Jt, which is a normal distribution with mean p = 0, and

VarW,) = &(l-Pf) = Nl-P?),

since the variance of the disturbance was set equal to 1. By this device the sample for each replication can be viewed as an independent drawing from the process. If we had always set jjl = 0, a procedure sometimes followed, the simula- tions would have tended to overrepresent the behavior of the process when it is close to its mean [except, of course, for /I1 = 0, when (17) reduces to the inde- pendent normal process]. This can have an important effect on estimation results, particularly when pi is close to unity. If we had permitted the observed yn from one simulation to be the starting value from which J1 of the next replication is generated, we would have induced a dependence between suc- cessive replications (again, /I1 = 0 is the exception).

For each replication, we first generated the sample for il = 60 by the scheme just described. This required 60 normal random numbers with mean zero and variance unity based on 60 rectangular random numbers generated by the pseudorandom number algorithm of Hewlett-Packard Time Sharing BASIC. Since there are 50 replications for each fir and n, we thus generated 60 x 50 = 3,000 observations on I,. For each replication, the samples of sizes 20 and 30 were drawn by taking every third and every second observation, respectively, from the sample of 60. Finally, each replication was used once for each of the seven values of /I1 from 0 to 0.99. Thus on each replication everything was held

constant except for the value of /3r. On the basis of the statistical analysis of each replication for each value of

/3r, we made one-step-ahead predictions for 20 additional observations. All 20 predictions were made without updating the coefficient estimates from the original sample of 20, 30, or 60; the fitted model was applied repeatedly for each new observation. The 20 additional observations were generated as follows. Imagine a circular arrangement of all the 3,000 observations on 1,. The 50 replications for statistical estimation were obtained by using each consecutive group of 60 numbers as just explained. The prediction sample of 20 new observa- tions for each group of 60 numbers was generated from the iirst 20 observations in the next set of 60 observations on &. The starting value for each such predic- tion sample was set equal to the final observation, JJ”, of the corresponding estimation sample. All predictions are predictions of yt, obtained directly or, for the differenced models, by adding the forecast of d$, to Y~-~.

For certain combinations of p1 and ~1, the OLS and/or Modal estimates occasionally violated the constraint p1 < 1. These results are excluded from our summaries, both of empirical distributions of estimates and of predictive tests

Page 19: Differencing of random walks and near random walks

N.J. Gonedes and H.V. Roberts, Dijerencing of random walks and near random walks 307

on the 20 additional observations. The numbers of excluded estimates are available from the authors, as are tables of more detailed results than those presented in the text. No estimates had to be excluded for pi 5 0.70 for all M and for p1 = 0.95 and p1 = 0.90 for n = 60. In order to determine whether the exclusions substantially affected our inferences we assigned a value of 0.999 to any estimate of p1 that exceeded unity and then recomputed the means of the empirical distributions of the resulting slope estimates. Although all means of empirical distributions were, of course, increased, the increase had no substantial effect on the comparisons between OLS and Modal estimates.

The problem of inadmissible slope estimates did not arise for the estimation procedures based on differences, OLS A and Modal A. But since the predictive performance of the random walk was included as a base of comparison in the predictive tests of both OLS and Modal estimation and OLS-d and Modal-d there were occasional slight discrepancies between the predictive performance of the RW model in the two sets of comparisons. For example, consider j3i = 0.99 and n = 20, for which, in accord with our criterion, we used only 35 of the 50 replications. The sample mean of the 35 mean squared errors on the predictive test for the RW model is 1.0402, as is shown in the corresponding summary tables for OLS and Modal estimates. But in the tables for OLS-d and Modal-d, where all 50 replications were used, the sample mean of the 50 mean squared errors for RW is 1.0863. All such differences are small and unsystematic.

References

Antelman, Gordon and Harry V. Roberts, 1971, Finding sampling distributions by a recog- nition method, Journal of American Statistical Association, March, 136-141.

Box, G.E.P. and G.M. Jenkins, 1970, Time series analysis: Forecasting and control (Holden- Day, San Francisco, CA).

Cochrane, Donald and Guy H. Orcutt, 1949, Application of least squares regression to relationships containing autocorrelated error terms, Journal of the American Statistical Association, March, 32-61.

Copas, J.B. 1966, Monte Carlo results for estimation in a stable Markov time series, Journal of the Royal Statistical Society, Ser. A, 129, Pt. l, llO-116.

Dixon, Wilfred J., 1944, Further contributions to the problem of serial correlation, Annals of Mathematical Statistics, 119-144.

Gonedes, Nicholas J., 1973, Evidence on the information content of accounting numbers, Journal of Financial and Quantitative Analysis, June, 407-444.

Graybill, F.A., 1969, Introduction to matrices with applications in statistics (Wadsworth, Belmont, CA).

Hurwicz, Leonid, 1950, Least squares bias in time series, in: T. Koopmans, ed., Statistical inference in dynamic economic models (Wiley, New York) 365-383.

Kendall, M.G., 1954, Note on bias in the estimation of autocorrelation, Biometrika 41, 403-404.

Marriott, F.H.C. and J.A. Pope, 1954, Bias in the estimation of autocorrelations, Biometrika 41,393403.

Orcutt, Guy H., 1948, A study of the autoregressive nature of the time series used for Tinbergen’s mode1 of the economic system of the United States, 1919-1932 (with dis- cussions), Journal of the Royal Statistical Society, Ser. B, 10, no. l, l-53.

Page 20: Differencing of random walks and near random walks

308 N. J. Gonedes and H. V. Roberts, Differencing of random walks and near random walks

Orcutt, Guy H. and Donald Cochrane, 1949, A sampling study of the merits of autoregressive and reduced form transformations in regression analysis, Journal of the American Statistical Association, Sept., 356-372.

Orcutt, Guy H. and H.S. Winokur, Jr., 1969, First order autoregression: Inference, estimation and prediction, Econometrica, Jan., l-14.

Press, S. James, 1972, Applied multivariate analysis (Holt, Rinehart and Winston, New York). Shenton, L.R. and W.L. Johnson, 1965, Moments of a serial correlation coefficient, Journal

of the Royal Statistical Society, Ser. B, 27, no. 2,308-320. Thornber, Hodson, 1967, Finite sample Monte Carlo studies: An autoregressive illustration,

Journal of the American Statistical Association, Sept., 801-818. White, J.S., 1961, Asymptotic expansions for the mean and variance of the serial correlation

coefficient, Biometrika 48,85-94. Yule, G. Udny, 1926, Why do we sometimes get nonsense-correlations between time series? -

A study in sampling and the nature of time series (with discussions), Journal of the Royal Statistical Society 89,1, Jan., l-69.

Zellner, Arnold, 1971, Introduction to Bayesian inference in econometrics (Wiley, New York).