11
Time Series Forecasting– Part I
What is a Time Series ? Components of Time Series Evaluation Methods of Forecast Smoothing Methods of Time Series Time Series Decomposition
by Duong Tuan Anh
Faculty of Computer Science and Engineering
September 2011
22
What is a Time series ?
A time series is a collection of observations made sequentially in time.
0 50 100 150 200 250 300 350 400 450 50023
24
25
26
27
28
29
Examples: Financial time series, scientific time series
A study on random sample of 4000 graphics from 15 of the the world’s news papers published between 1974 and 1989 found that more than 75% of all graphics were time series.
33
Time series models
Regression models Predict the response over time of the variable under
study to changes in one or more of the explanatory variables.
Deterministic models of time series Stochastic models of time seriesAll the three kinds of models can be used for
forecasting.
44
Components of a time series
The pattern or behavior of the data in a time series has several components.
Theoretically, any time series can be decomposed into: Trend Cyclical Seasonal Irregular
However, this decomposition is often not straight-forward because these factors interact.
55
Trend component
The trend component accounts for the gradual shifting of the time series to relatively higher or lower values over a long period of time.
Trend is usually the result of long-term factors such as changes in the population, demographics, technology, or consumer preferences.
66
Seasonal component
The seasonal component accounts for regular patterns of variability within certain time periods, such as a year.
The variability does not always correspond with the seasons of the year (i.e. winter, spring, summer, fall).
There can be, for example, within-week or within-day “seasonal” behavior.
77
Cyclical component
Any regular pattern of sequences of values above and below the trend line lasting more than one year can be attributed to the cyclical component.
Usually, this component is due to multiyear cyclical movements in the economy.
88
Evaluating Methods of forecasts
Forecasting method is selected - many times by intuition, previous experience, or computer resource availability
Divide the data into two sections - an initialization part and a test part
Use the forecast technique to determine the fitted values for the initialization data set
Use the forecast technique to forecast the test data set and determine the forecast errors
Evaluate errors (MAD, MPE, MSD, MAPE) Use the technique, modify, or develop new model
99
Evaluation Methods of Forecasts
There are three measures of accuracy of the fitted models: MAPE, MAD and MSD for each of the sample forecasting and smoothing methods.
For all three measures, the smaller the value, the better the fit of the model.
Use these statistics to compare the fit of the different methods.
MAPE (Mean Absolute Percentage Error) measure the accuracy of fitted time series values. It expresses accuracy as a percentage.
|(yt-yt’)/yt| MAPE = -------------- 100 (yt 0) n
1010
MAPE, MAD, and MSD
where yt is the actual value, yt’ is the fitted value and n is the number of observations.
MAD (Mean Absolute Deviation) expresses accuracy in the same units as the data, which help conceptualize the amount of error.
|yt-yt’|
MAD = ----------
n
where yt is the actual value, yt’ is the fitted value and n is
the number of observations.
1111
MAPE, MAD, and MSD
MSD(Mean Squared Deviation) is a more sensitive measure of an unusually large forecast error than MAD.
(yt-yt’)2
MSD = ---------- n
where yt is the actual value, yt’ is the fitted value and n is the number of observations.
1212
Methods of smoothing time series
Arithmetic Moving Average Exponential Smoothing Methods Holt-Winters method for Exponential Smoothing
Smoothing a time series: to eliminate some of short-term fluctuations.
Smoothing also can be done to remove seasonal fluctuations, i.e., to deseasonalize a time series.
These models are deterministic in that no reference is made to the sources or nature of the underlying randomness in the series.
The models involves extrapolation techniques.
1313
Averaging Methods
Simple Averages - quick, inexpensive (should only be used on stationary data)
Moving Average method consists of computing an average of the most recent n data values for the series and using this average for forecasting the value of the time series for the next period.
Moving averages are useful if one can assume item to be forecast will stay steady over time.
Series of arithmetic means – used only for smoothing, provides overall impression of data over time
(most recent n data items) Moving Average = ------------------------------------------
n
1414
Moving average methods
Works best with stationary data. The smaller the number, the more weight given to
recent periods. A smaller number is desirable when there are
sudden shifts in the level of the series. The greater the number, less weight is given to
more recent periods. The larger the order of the moving average, the
greater the smoothing effect. Larger n when there are wide, infrequent fluctuations in the data.
By smoothing recent actual values, removes randomness.
1515
Weighted Moving Averages
Weighted Moving Average - place more weight on recent observations. Sum of the weights needs to equal 1.
Used when trend is present Older data usually less important
(weight for period n)(Value in period n)
WMA = --------------------------------------------------------
weights
1616
Notes on Moving Averages
MA models do not provide information about forecast confidence.
We can not calculate standard errors. We can not explain the stochastic component of the
time series. This stochastic component creates the error in our forecast.
1717
Exponential Smoothing Methods
Single Exponential Smoothing (Averaging) Double Exponential Smoothing & Holt’s Method Winter’s Model.Note: - Single Exponential Smoothing is for series without
trend and without seasonal component. - Double Exponential Smoothing is for series with trend
and without seasonal component. - Winter’s model is for for series with trend and
seasonal component.
1818
Single Exponential Smoothing
Continually revising a forecast in light of more recent experiences. Averaging (smoothing) past values of a series in a decreasing (exponential) manner. The observations are weighted with more weight being given to the more recent observations
At = αYt-1 + (1 – α) At-1 (S1) New forecast = α (old observation) + (1- α) old
forecast
Here we denote the original series by yt and the smoothed series by At.
The equation can be rewritten as:
At = At-1 + α(Yt –At-1)
1919
Single Exponential Smoothing
When looking at the formula – new forecast is really the old forecast plus times the error in the old forecast
To get started, we need a smoothing constant , an initial forecast, and an actual value. We can use the first actual as the forecast value or we can average the first n observations.
The smoothing constant serves as the weighting factor. When is close to 1, the new forecast will include a substantial adjustment for any error that occurred in the preceding forecast. When is close to 0, the new forecast is very similar to the old forecast.
2020
Single Exponential Smoothing (cont.)
The smoothing constant α is not an arbitrary choice - but generally falls between 0.1 and 0.5. If we want predictions to be stable and random variation smoothed, use a small . If we want a rapid response, a larger value is required.
2121
Why Exponential?
At = Yt-1 + (1- )At-1
At-1 = Yt-2 + (1- )At-2
At-2 = Yt-3 + (1- )At-3
…At = Yt-1 + (1- ) Yt-2 + (1- ) 2Yt-3 + …. + (1 - ) kYt-k+1
k decreases exponentially.
2222
Actual
Smoothed
Forecast
Actual
Smoothed
Forecast
0 10 20 30
150
250
350
450
550
650
750
850
Sal
es
Time
Smoothing Constant
Alpha:
MAPE:
MAD:
MSD:
0.100
37.0
134.9
27735.5
Sales data Single Exponential Smoothing .1
The small here smooths the data.
2323
Actual
Smoothed
Forecast
Actual
Smoothed
Forecast
0 5 10 15 20 25
140
240
340
440
540
640
740
840
940
Sal
es
Time
Smoothing Constant
Alpha:
MAPE:
MAD:
MSD:
0.600
36.5
134.5
22248.4
Sales data Single Exponential Smoothing .6
The large in this example responds quickly to the data.
2424
Tracking
Use a tracking signal (measure of errors over time) and setting limits. For example, if we forecast n periods, count the number of negative and positive errors. If the number of positive errors is substantially less or greater than n/2, then the process is out of control.
Can also use 95% prediction interval (1.96 * sqrt (MSE)). If the forecast error is outside of the interval, use a new optimal .
Looking back at the .1 single exponential smoothing:1.96*sqrt(24261) = +-305 Observation #21 is out-of-control. We
need to re-evaluate alpha level because this technique is biased.
2525
Exponential Smoothing Adjusted for Trend: Holt’s method
In some situations, the observed data are trending and contain information that allows the anticipation of future upward movement.
In that case, a linear trend forecast function is needed. Holt’s smoothing method allows for evolving local
linear trend in a time series and can be used to forecast.
When there is a trend, an estimate of the current slope and the current level is required.
2626
Holt’s Method
Holt’s method uses two coefficients. is the smoothing constant for the level is the trend smoothing constant - used to remove
random error. Advantage of Holt’s method: it provides
flexibility in selecting the rates at which the level and trend are tracked.
2727
Equations in Holt’s method
The exponentially smoothed series, or the current level estimate: At = Yt + (1- )(At-1 + Tt-1) (S2) The trend estimate: Tt = (At – At-1)+(1- )Tt-1 (S3) Forecast p periods into the future: Y’t+p = At + pTt
where At = new smoothed value (estimate of current level) Yt = new actual value at time t. Tt = trend estimate Y’t+p = forecast for p periods into the future. = smoothing constant for the level = smoothing constant for trend estimate
28
How to initiate Holt’s method
To get started, initial values for A and T in equation (S2) and (S3) must be determined.
One approach is to set A1 to Y1 and T1 to zero.
The second approach is to use the average of the first five or six observations as A1. T1 is then estimated by the slope of a line that is fit to these five or six observations.
29
Holt’s method
Holt exponential smoothing with parameters = 1.0 and = 0.099 for time series of electricity consumption.
3030
Winter’s Method
Winters’ method is an easy way to account for seasonality when data have a seasonal pattern.
It extends Holt’s Method to include an estimate for seasonality. is the smoothing constant for the level is the trend smoothing constant - used to remove random
error. smoothing constant for seasonality
This formula removes seasonal effects. The forecast is modified by multiplying by a seasonal index.
3131
Winter’s Method
The four equations used in Winters’ (multiplication) smoothing are:
The smoothed series or level estimate: At = Yt /St-s+ (1- )(At-1 + Tt-1) The trend estimate: Tt = (At – At-1)+(1- ) Tt-1 The seasonality estimate: St = Yt/At + (1- )St-s Forecast p periods into the future: Y’t+p = (At + pTt)St-s+p
where At = new smoothed value (estimate of current level) Yt = new actual value at time t. Tt = trend estimate Y’t+p = forecast for p periods into the future. Tt = trend estimate = smoothing constant for the level = smoothing constant for trend estimate = smoothing constant for seasonality estimate p = periods to be forecast into the future s = length of seasonality
WINTERS’ METHOD
Is also called TRIPLE EXPONENTIAL SMOOTHING )
32
How to initiate Winter’s method
To begin the Winter’s method, the initial values for the smoothed series At, the trend Tt and the seasonal indices St must be set.
One approach is to set the first estimate of At to Y1. The trend is estimated to 0 and the seasonal indices are each set to 1.0.
3333
Actual
Smoothed
Forecast
Actual
Smoothed
Forecast
0 5 10 15 20 25
100
200
300
400
500
600
700
800
900
Sal
es
Time
Smoothing ConstantsAlpha (level):
Gamma (trend):Delta (season):
MAPE:MAD:
MSD:
0.400
0.1000.300
15.21 63.55
7636.86
Winter's Method Sales Data
Winter’s Method
34
Decomposition
Decomposition is a procedure to identify the component factors of a time series.
How the components relate to the original series: a model that expresses the time series variable Y in terms of the components T (trend), C (cycle), S (seasonal) and I (iregular).
Additive components model & multiplicative components model.
It is difficult to deal with cyclical component of a time series. To keep things simple we assume that any cycle in the data is part of the trend.
Additive model: Yt = Tt + St + It
Multiplicative model: Yt = Tt St It
35
Additive and multiplicative models
The additive model works best when the time series has roughly the same variability through the length of the series. That is, all the values of the series fall within a band with
constant width centered on the trend. The multiplicative model works best when the
variability of the time series increased with the level. That is the values of the series become larger as the trend
increases. See the figure in the next slide. Most economic time series have seasonal variation
that increases with the level of the series. So multiplicative model is suitable to them.
36
(a) A time series with constant variability
(b) A time series with variability increasing with level
37
Trend equations
Trend can be described by a straight line or a smooth line.
Linear trend: T’t = a + bt Here T’t is the predicted value for the trend at time t. The
symbol t used for the variable represents time and takes integer values 1,2,3,… The slope b is the average increase or decrease in T for each one-period increase in time.
Time trend equations can be fit to the data using the method of least squares. Recall that this method selects the values of coefficients in the
trend equation (e.g. a and b) so that the estimated trend values T’t are close to the actual value Yt as measured by the sum of squared errors criterion
SSE = (Yt – T’t)2
(See Appendix of this chapter for how to find a and b)
38
Trend line for the Car Registrations Time Series
39
Additional trend curves
The life cycle of a new product has 3 stages: introduction, growth, and maturity and saturation.
A curve is needed to model the trend over a new product.
A simple function that allows for curvature is the quadratic trend T’t = b0 + b1t + b2t2
When a time series starts slowly and then appears to be increasing at an increasing rate Exponential trend: T’t = b0 b1
t
The coefficient b1 is related to the growth rate.
40
41
The increase in the number of salespeople is not constant. It appears as if increasingly larger numbers of people are being added in the later years.
An exponential trend curve fit to the salepeople data has the equation:
T’t = 10.016(1.313)t
42
Seasonality
Several methods for measuring seasonal variation.
The basic idea: first estimate and remove the trend from the original
series and then smooth out the irregular component. This leaves data containing only seasonal variation.
The seasonal values are collected and summarized to produce a number for each observed interval of the year (week, month, quarter, and so on)
43
Identification of seasonal component
The identification of seasonal component in a time series differs from trend analysis in two ways: The trend is determined directly from the original data, but the
seasonal component is determined indirectly after eliminating the other components from the data.
The trend is represented by one best-fitting curve, but a separate seasonal value has to be computed for each observed interval.
If an additive decomposition is employed, estimates of the trend, seasonal components are added together to produce the original series.
If an multiplicative decomposition is employed, estimates of individual components must be multiplied together to produce the original series
44
Seasonal indices
The seasonal indices measure the seasonal variation in the series.
Seasonal indices are percentages that show changes over time.
Ex: With monthly data, a seasonal index of 1.0 for a particular
month means the expected value for that month is 1/12 the total for the year.
An index of 1.25 for a different month implies the observation for that month is expected to be 25% more than 1/12 of the annual total.
A monthly index of 0.80 indicates that the expected level of that month is 20% less than 1/12 the total for the year.
45
Seasonal adjustment
After the seasonal component has been isolated, it can be used to calculate seasonally adjusted data.
Seasonal adjustment techniques are ad hoc methods of computing seasonal indices and use those indices to deseasonalize the series by removing those seasonal variation.
For an multiplicative decomposition, the seasonally adjusted data are computed by dividing the original data by the seasonal component (i.e. seasonal index)
deseasonalized data = raw data/seasonal index
46
Seasonal adjustment technique
Seasonal adjustment techniques are based on the idea that a time series yt can be represented as the product of 4 components:
yt = T S C I The objective is to eliminate the seasonal component S. First, we try to isolate the combined trend and cyclical
components T C. This cannot be done exactly; instead an ad-hoc smoothing procedure is used to remove T C from the original time series.
For example, supposed that yt consists of monthly data. Then a 12-month average ym
t is computed: ym
t = (yt+6+… + yt + yt-1 + … + yt-5)/12 Presumably ym
t is relatively free of seasonal and irregular fluctuations and is thus as estimate of T C.
Now, we divide the original data by this estimate of T C to obtain an estimate of the combined seasonal and irregular components S I.
47
Seasonal adjustment technique (cont.)
S I = yt/ ymt = zt
The next step is to eliminate the irregular component I in order to obtain the seasonal index. To do this, we average the values of S I corresponding to the same month.
In other words, suppose that y1 (and hence z1) corresponds to January, y2 to February, etc., and there are 48 months of data. We thus compute
zm1 = (z1 + z13 + z25 + z37)
zm2 = (z2 + z14 + z26 + z38)
…………………………… zm
12 = (z12 + z24 + z36 + z48)
48
Seasonal adjustment technique (cont.)
The rationale here is that when the seasonal-irregular percentages zt are averaged for each month (each quarter if the data are quarterly), the irregular fluctuations will be largely smoothed out.
The 12 averages zm1,…, zm
12 will then be estimates of the seasonal indices. They should sum close to 12.
The deseasonalization of the original series yt is now straightforward; just divide each value in the series by its corresponding seasonal index.
Thus, the seasonally adjusted yat is obtained from
ya1 =y1/ zm
1, ya2 =y2/ zm
2 …, ya12 =y12/ zm
12, etc.
49
Appendix: Least-square parameter estimates
Our goal is to minimize (Yt – Y’t)2 where Y’t = a + bXi is the fitted value of Y corresponding to a particular observation Xi.
We minimize the expression by taking the partial derivatives with respect to a and to b, setting each equal to 0, and solving the resulting pair of simultaneous equations:
2)( ii bXaYa
=-2 )( ii bXaY
2)( ii bXaYb
=-2 )( iii bXaYX
(A.1)
(A.2)
50
Least-square parameter estimates
Equating these derivatives to zero and dividing by -2, we get
(Yi – a – bXi) = 0 (A.3)
Xi(Yi – a – bXi) = 0 (A.4) Finally by rewriting Eqs. (A.3) and (A.4), we obtain the
pair of simultaneous equations: Yi = aN + bXi (A.5)
XiYi = aXi +bXi2 (A.6)
Now we can solve for a and b simultaneously by multiplying (A.5) by Xi and Eq. (A.6) by N: XiYi = aNXi + b(Xi)2 (A.7)
NXiYi = aNXi +bN(Xi)2 (A.8)
51
Least-square parameter estimates (cont.)
Subtracting Eq. (A.7) from Eq. (A.8), we get
NXiYi - XiYi = b[N(Xi)2 - (Xi)2 ] (A.9)
from which it follows that
b = (NXiYi - XiYi )/ (N(Xi)2 - (Xi)2) (A.10)
Given b, we may calculate a from Eq. (A.5):
a = (Yi - b Xi)/N (A.11)