6-TF_ARCH_SAS_V2

257

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Girish Kumar Jha

IARI, Pusa, New Delhi 110 012 [email protected]

1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting of a single time series. In such situations, we can only relate the series to its own past and do not explicitly use the information contained in other pertinent time series. In many cases, however, a time series is not only related to its own past, but may also be influenced by the present and past values of other time series. The models that can accommodate such situation are referred as the Transfer function models (Box et al, 1994). Transfer function models, which are extensions of familiar linear regression models, have been widely used in various fields of research. In macroeconomics, transfer function models can be used to study the dynamic interrelationships among the variables in an economic system. In marketing, these models are used to determine the factors, such as advertisement, competition, or economic conditions that may affect the sale of certain products. Because of its close relationship with regression models, transfer function models are also referred to as dynamic regression models (Pankratz, 1983). Transfer function approach to modeling a time series is a multivariate way of modeling the various lag structures found in the data. It is similar to a distributed lag model in traditional econometrics. There may seem to be a close relationship between the Transfer Function models and multiple regressions (OLS). But the transfer function models differ from the regression model in the way they use the explanatory variables to forecast the dependent variable. The simple transfer function models assume contemporaneous relationship between explanatory variables and the dependent variable (forecast of the explanatory variable at time t+1 explains the behavior of dependent variable at the time t+1). General Transfer Function models extend the simple transfer function approach to include previous, or lagged, values of the explanatory variables (General transfer function can use the forecasts of the explanatory variable at the time t+1 to explain the behavior of dependent variable at time t+2). The transfer function models can use more than one explanatory variable, but the explanatory variables must be linearly independent of each other. The transfer function models use forecast values of the explanatory variables to forecast the values of the dependent variables. The variability in the forecasts of the explanatory variables is incorporated into the forecasts of the dependent variable. To model the dependent variable with simple transfer function model, we need to perform more task than that required for the regression model. The following steps are involved in the modeling of simple transfer function:

Identify the model to describe the explanatory variables Estimate a model for the explanatory variables Identify and estimate the regression model for the dependent variable, using the

explanatory variables and an appropriate error process Forecast the dependent variable

Advanced Forecasting Models Using SAS Software

258

Thus, we have to model the explanatory variable before using them to model the dependent variable and then forecast with the transfer function model. Forecasting with regression model does not require any modeling of explanatory variables. 7.1 Example of Transfer Function Model For example, suppose we want to model the effect of an advertising campaign on sales. As we know, the effect of an advertising campaign lasts for some time beyond the end of the campaign. Hence, monthly sales figures (y) may be modeled as a function of the advertising expenditure in each of the past few months. We will model the sales series as a regression against the advertising expenditure from the current month and the past few months. We can use the PROC ARIMA to carry out a simple Transfer Function Model. This is illustrated by the following SAS statements: data sale; title Estimate the Model for the dependent Variable; title "t=time y=sale volume x = advertising expenditure"; input t y x; datalines; 1 12.0 15 2 20.5 16 3 21.0 18 4 15.5 27 5 15.3 21 6 23.5 49 7 24.5 21 8 21.3 22 9 23.5 28 10 28.0 36 11 24.0 40 12 15.5 3 13 17.3 21 14 25.3 29 15 25.0 62 16 36.5 65 17 36.5 46 18 29.6 44 19 30.5 33 20 28.0 62 21 26.0 22 22 21.5 12 23 19.7 24 24 19.0 3 25 16.0 5 26 20.7 14 27 26.5 36 28 30.6 40 29 32.3 49


259

30 29.5 7 31 28.3 52 32 31.3 65 33 32.2 17 34 26.4 5 35 23.4 17 36 16.4 1 ; proc print data=sale; run; proc arima data=sale; identify var=y crosscorr=(x) noprint; estimate input =((1 2 3)x); run; The output for the above SAS code is given below

Estimate the Model for the dependent Variable t=time y=sale volume x = advertising expenditure The ARIMA Procedure Conditional Least Squares Estimation Standard Approx Parameter Estimate Error t Value Pr > |t| Lag Variable Shift MU 13.61539 1.87392 7.27


260

Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq --------------------Autocorrelations-------------------- 6 12.80 6 0.0463 0.482 0.191 0.248 0.110 0.016 0.071 12 17.73 12 0.1240 0.248 0.192 0.041 -0.008 0.000 -0.087 18 19.54 18 0.3592 -0.067 -0.050 -0.134 0.011 0.048 -0.031 24 48.37 24 0.0023 -0.042 -0.036 -0.152 -0.367 -0.289 -0.140 Model for variable y Estimated Intercept 13.61539 The ARIMA Procedure Input Number 1 Input Variable x Numerator Factors Factor 1: 0.14644 + 0.15063 B**(1) + 0.05018 B**(2) + 0.0272 B**(3) The CROSSCORR= option of the IDENTIFY statement prints sample cross-correlation functions that show the correlation between the response series and the input series at different lags. The sample cross-correlation function can be used to help identify the form of the transfer function appropriate for an input series. In this case, following model has been estimated. ttt aXBBBY )( 332210 This example models the effect of advertising expenditure (x) on sale (y) as a linear function of the current and three most recent values of advertising expenditure (x). It is equivalent to a multiple linear regression of sale (y) on x, LAG(x), LAG2(x), and LAG3(x). This is an example of a transfer function with one numerator factor. The numerator factors for a transfer function for an input series are like the MA part of the ARMA model for the noise series. We can also use transfer functions with denominator factors. The denominator factors for a transfer function for an input series are like the AR part of the ARMA model for the noise series. Denominator factors introduce exponentially weighted, infinite distributed lags into the transfer function. To specify transfer functions with denominator factors, we place the denominator factors after a slash (/) in the INPUT= option. For example, the following statements estimate the advertising expenditure effect as an infinite distributed lag model with exponentially declining weights: proc arima data = sale; identify var = y crosscorr = x; estimate input = ( / (1) x ); run; The transfer function specified by these statements is as follows: tXB)1( 1

0


261

This transfer function also can be written in the following equivalent form: i

i

i B

1

10 1(

This transfer function can be used with intervention inputs. When it is used with a pulse function input, the result is an intervention effect that dies out gradually over time. When it is used with a step function input, the result is an intervention effect that increases gradually to a limiting value. 2. Volatility Forecasting One of the main assumptions of the standard regression analysis and regression models with autocorrelated errors is that the variance , of the errors is constant. In many practical applications, this assumption may not be realistic. For example, in financial investment, it is generally agreed that stock markets volatility is rarely constant over time. Indeed, the study of the market volatility as it relates to time is the main interest for many researchers and investors. Such a model incorporating the possibility of a nonconstant error variance is called a heteroscedasticity model. Many approaches can be used to deal with heteroscedasticity. For example, the weighted regression is often used if the error variance at different times is known. In practice, however, the error variance is normally unknown; therefore, models to account for the heteroscedasticity are needed. Volatility has been one of the most active and successful areas of research in time series econometrics and economic forecasting in recent decades. Volatility refers to the variability of the random (unforeseen) component of a time series. In economic theory, volatility connotes two principal concepts: variability and uncertainty; the former describing overall movement and the latter referring to movement that is unpredictable. There are various ways of measuring price volatility. The nave approach involves treating all price movements as indicative of instability by calculating standard deviation of the price index. This approach does not account for predictable components like trends in the price evolution process thereby overstating the uncertainty. A better and useful method of measuring instability is by using the ratio method. In this method, the instability of the series is calculated by measuring the standard deviation of log (Pt / P t-1) over a period, where Pt is price in period t and Pt-1 is the price in period t-1. The third approach is the one which distinguishes between predictable and unpredictable components of price series, but the price volatility is assumed to remain time invariant. The fourth approach distinguishes not only between predictable and unpredictable components of prices but also allows the variance of unpredictable element to be time varying. Such time varying conditional variances can be estimated by using a Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model. 8.1 ARCH The original model of autoregressive conditional heteroscedasticity (ARCH) introduced in Engle (1982) has the conditional variance equation


262

where the constraints on the coefficient are necessary to ensure that the conditional variance is always positive. This is the ARCH conditional variance specification, with a memory of p periods. This model captures the conditional hetroscedasticity of returns by using a moving average of past squared unexpected returns: if a major market movement in either direction occurred periods ago , then the effect will be to increase todays conditional variance. This means that we are more likely to have a large market move today, so large movements tend to follow large movement of either sign which is known as volatility clustering. 8.2 Vanilla GARCH The generalization of Engles ARCH model by Bollerslev (1986) adds q autoregressive terms to the moving averages of squared unexpected returns. Then it takes the form

The parsimonious GARCH (1, 1) model, which has just one lagged error square and one autoregressive terms, is most commonly used:

It is equivalent to an infinite ARCH model, with exponentially declining weights on the past squared errors. In the above model, the sum of The sum of gives the degree of persistence of volatility in the series. The closer the sum to 1, greater is the tendency of volatility to persist for longer time. If the sum exceeds 1, it is indicative of an explosive series with a tendency to meander away from mean value. The GARCH estimates are being used to identify periods of high volatility and volatility clustering. The constant determines the long-term average level of volatility to which GARCH forecasts converge. Unlike the lag and returns coefficients, its value is quite sensitive to the length of data period used to estimate the model. If a period of many years is used, during which there were extreme markets movements, then the estimates of will be high. 8.3 Integrated GARCH When we can put and we write the GARCH (1, 1) model as

This is a non-stationary GARCH model called the integrated GARCH (I-GARCH) model, for which term structure forecasts do not converge. Our main interest in the I-GARCH model is that when it is equivalent to an infinite Exponentially Weighted Moving Average (EWMA). 8.4 Example for GARCH modeling In this example, we consider the bivariate series containing 46 monthly observations on Mumbai and Delhi spot prices for onion from January 1988 to October 1991, measured in rupees per 1000 grams (Rs/kg). Mumbai is the main market for Maharashtra which is one of the major onion growing states. Objective is to examine whether one can predict the Delhi spot price from the current spot price of Mumbai using time series regression model. This is the situation of regression with time series errors and unequal variances.


263

Title1 'Spot prices of onion in Delhi and Mumbai'; Title2 '(January 1988 to October 1991)'; DATA onion; Observations = _N_; INPUT Year Month Delhi Mumbai; CARDS; 1988 1 1.875 2.065 1988 2 1.898 1.988 1988 3 1.643 1.818 1988 4 1.332 1.493 1988 5 1.262 1.383 1988 6 1.24 1.378 1988 7 1.265 1.433 1988 8 1.31 1.543 1988 9 1.467 1.713 1988 10 1.5 1.688 1988 11 1.633 1.908 1988 12 1.78 2.207 1989 1 1.803 2.173 1989 2 1.472 1.74 1989 3 1.247 1.458 1989 4 1.273 1.515 1989 5 1.373 1.642 1989 6 1.408 1.687 1989 7 1.378 1.643 1989 8 1.375 1.575 1989 9 1.308 1.513 1989 10 1.315 1.555 1989 11 1.447 1.765 1989 12 1.75 2.102 1990 1 2.203 2.42 1990 2 1.623 1.982 1990 3 1.263 1.487 1990 4 1.252 1.468 1990 5 1.252 1.45 1990 6 1.277 1.46 1990 7 1.252 1.418 1990 8 1.22 1.34 1990 9 1.24 1.353 1990 10 1.412 1.568 1990 11 1.807 2.052 1990 12 1.903 2.237 1991 1 1.627 1.83 1991 2 1.223 1.34 1991 3 1.208 1.318


264

1991 4 1.208 1.32 1991 5 1.205 1.298 1991 6 1.165 1.258 1991 7 1.02 1.117 1991 8 1.065 1.137 1991 9 1.287 1.368 1991 10 1.613 1.732 ; PROC AUTOREG DATA=Onion; MODEL Delhi = Mumbai /NLAG=1 GARCH=(Q=1); OUTPUT OUT=Onion R=Residual P=Predicted LCL=Low95CL UCL=Up95CL; RUN; PROC PRINT DATA=Onion; RUN; The output for the above SAS code is given below Spot prices of onion in Delhi and Mumbai (January 1988 to October 1991) The AUTOREG Procedure Dependent Variable Oklahoma Ordinary Least Squares Estimates SSE 0.1650337 DFE 44 MSE 0.00375 Root MSE 0.06124 SBC -120.79173 AIC -124.44902 Regress R-Square 0.9444 Total R-Square 0.9444 Durbin-Watson 0.9324 Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.1184 0.0487 2.43 0.0192 Mumbai 1 0.8038 0.0294 27.35


265

Estimates of Autocorrelations Lag Covariance Correlation -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 0 0.00359 1.000000 | |********************| 1 0.00170 0.473597 | |********* | Preliminary MSE 0.00278 Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 1 -0.473597 0.134312 -3.53 Algorithm converged. The AUTOREG Procedure GARCH Estimates SSE 0.12768143 Observations 46 MSE 0.00278 Uncond Var . Log Likelihood 80.3575828 Total R-Square 0.9570 SBC -141.57196 AIC -150.71517 Normality Test 0.0440 Pr > ChiSq 0.9782

Table: Estimation of the GARCH(0,1) model Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.0677 0.0223 3.04 0.0023 Mumbai 1 0.8584 0.0111 77.01


266

12 1.91520 -0.13520 1.78290 2.04750 12 1988 12 1.780 2.207 13 1.81503 -0.01203 1.68363 1.94642 13 1989 1 1.803 2.173 14 1.47714 -0.00514 1.35274 1.60154 14 1989 2 1.472 1.740 15 1.26142 -0.01442 1.13668 1.38616 15 1989 3 1.247 1.458 16 1.32139 -0.04839 1.19703 1.44574 16 1989 4 1.273 1.515 17 1.41555 -0.04255 1.29149 1.53962 17 1989 5 1.373 1.642 18 1.44835 -0.04035 1.32419 1.57251 18 1989 6 1.408 1.687 19 1.40823 -0.03023 1.28416 1.53230 19 1989 7 1.378 1.643 20 1.35489 0.02011 1.23077 1.47901 20 1989 8 1.375 1.575 21 1.33752 -0.02952 1.21316 1.46188 21 1989 9 1.308 1.513 22 1.36465 -0.04965 1.24047 1.48883 22 1989 10 1.315 1.555 23 1.52610 -0.07910 1.40154 1.65066 23 1989 11 1.447 1.765 24 1.78412 -0.03412 1.65446 1.91378 24 1989 12 1.750 2.102 25 2.06597 0.13703 1.92696 2.20499 25 1990 1 2.203 2.420 26 1.80658 -0.18358 1.67934 1.93381 26 1990 2 1.623 1.982 27 1.24958 0.01342 1.12506 1.37411 27 1990 3 1.263 1.487 28 1.27529 -0.02329 1.15063 1.39995 28 1990 4 1.252 1.468 29 1.26327 -0.01127 1.13847 1.38808 29 1990 5 1.252 1.450 30 1.28186 -0.00486 1.15714 1.40658 30 1990 6 1.277 1.460 31 1.25644 -0.00444 1.13133 1.38155 31 1990 7 1.252 1.418 32 1.19664 0.02336 1.07059 1.32269 32 1990 8 1.220 1.340 33 1.23043 0.00957 1.10456 1.35630 33 1990 9 1.240 1.353 34 1.42070 -0.00870 1.29657 1.54484 34 1990 10 1.412 1.568 35 1.82802 -0.02102 1.69945 1.95659 35 1990 11 1.807 2.052 36 1.97358 -0.07058 1.84045 2.10672 36 1990 12 1.903 2.237 37 1.58357 0.04343 1.45845 1.70869 37 1991 1 1.627 1.830 38 1.21047 0.01253 1.08442 1.33651 38 1991 2 1.223 1.340 39 1.20233 0.00567 1.07596 1.32869 39 1991 3 1.208 1.318 40 1.20656 0.00144 1.08022 1.33290 40 1991 4 1.208 1.320 41 1.18656 0.01844 1.05989 1.31324 41 1991 5 1.205 1.298 42 1.16251 0.00249 1.03517 1.28986 42 1991 6 1.165 1.258 43 1.03782 -0.01782 0.90751 1.16812 43 1991 7 1.020 1.117 44 1.03946 0.02554 0.90963 1.16930 44 1991 8 1.065 1.137 45 1.25577 0.03123 1.13009 1.38144 45 1991 9 1.287 1.368 46 1.58357 0.02943 1.45921 1.70792 46 1991 10 1.613 1.732 The detailed interpretation of above analyses will be discussed in the class. References Bolerslev, Tim (1986). Generalized autoregressive conditional heteroscedasticity. Journal of

Econometrics, 31, 307-327. Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994). Time Series Analysis: Forecasting and

Control, Pearson Education, Delhi. Croxton, F.E., Cowden, D.J. and Klein, S.(1979). Applied General Statistics. Prentice Hall of

India Pvt. Ltd., New Delhi. Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of

United Kingdom inflation. Econometrica, 50, 987-1007. Makridakis, S., Wheelwright, S.C. and Hyndman, R.J. (1998). Forecasting Methods and

Applications, 3rd Edition, John Wiley, New York. Pankratz, A. (1983). Forecasting with univariate Box Jenkins models: concepts and cases,

John Wiley, New York.

Documents

6-TF_ARCH_SAS_V2