Upload
percival-little
View
225
Download
2
Tags:
Embed Size (px)
Citation preview
Business Forecasting
Chapter 4Data Collection and Analysis
in Forecasting
Chapter Topics
Preliminary Adjustments to Data
Data Transformation
Patterns in Time Series Data
The Classical Decomposition Method
Preliminary Data Adjustments
Trading Day Adjustments
Price Change Adjustments
Population Change Adjustments
Trading Day Adjustments
Yr J F M A M J J A S O N D
05
21 20 23 21 22 22 21 23 22 21 22 22
06
22 20 23 20 23 22 21 23 22 22 22 21
07
23 20 22 21 23 21 22 23 20 23 22 21
Trading Day AdjustmentAverage Trading Days for Each Month Month Average Number of Trading Days January 22.00
February 20.00
March 22.67
April 20.67
May 22.67
June 21.67
July 21.33
August 23.00
September 21.33
October 22.00
November 22.00
December 21.33
Trading Day Adjustments
Trading Day Computation for October Trading Day Actual Adjusted Year Trading Days Coefficient Data Data 2005 21 21/22 = 0.954 5,000,000 5,241,090
2006 22 22/22 = 1.000 5,000,000 5,000,000
2007 23 23/22 = 1.045 4,000,000 3,827,751
Price Change Adjustments Compunet Sales Data, 1990–2005 (1) (2) (3) (4) (5) (6) Compunet Sales Computer Software Price Sales in in Current $ Price Index Price Index Index* Constant $ Year (Thousands) 1995=100 1995=100 1995=100 (Thousands) 1990 89.0 58.40 63.81 59.48 149.63
1991 90.0 57.03 71.98 60.02 149.94
1992 95.4 66.57 77.61 68.78 138.71
1993 96.6 72.47 86.19 75.21 128.44
1994 98.9 79.27 91.55 81.73 121.01
1995 99.4 100.00 100.00 100.00 99.40
1996 100.2 110.14 114.61 111.03 90.24
1997 106.2 123.15 144.10 127.34 83.40
1998 107.5 131.92 166.22 138.78 77.46
1999 108.3 145.23 204.56 157.10 68.94
2000 120.0 153.40 236.19 169.96 70.60
2001 105.0 129.20 234.18 150.20 69.91
2002 103.8 116.79 224.66 138.37 75.02
2003 102.1 117.70 229.76 140.11 72.87
2004 98.7 124.51 247.05 149.02 66.23
2005 99.6 128.74 260.05 155.01 64.26
Price Change Adjustments
Having computed the price index, we are now able to deflate the sales revenue with the weighted price index in the following way:
63.1490.48.59
100)0.89( 1990
salesDeflated
Price Change Adjustments To see the impact of separating the effect of
price level changes, we graph the price of computers in constant and current dollars.
Figure 4.1 Computer Sales in Current and Constant Dollars
0 20 40 60 80
100 120 140 160
1986 1991 1996 2001 2006 Time
Sales
Current Dollars Constant Dollars
Population Change Adjustments
Disposable Personal Income and Per Capita Income for the U.S. 1990 and 2005
Disposable Income Population Per Capita DisposableYear Billions of Dollars (in Millions) Income ($)
1990 4285.8 250.2 17,129.50
1991 4464.3 253.5 17,610.65
…… …… ……. ……….
…… …… ……. ……….
1999 6695.0 279.3 23,970.64
2000 7194.0 282.4 25,474.50
2001 7486.8 285.4 26,232.66
2002 7830.1 288.3 27,159.56
2003 8162.5 291.1 28,040.19
2004 8681.6 293.9 29,539.30
2005 9036.1 296.7 30,455.34
Data Transformation
Most appropriate remedial measure for variance heterogeneity.
Original data are converted into a new scale, resulting in a new data set that is expected to satisfy the condition of homogeneity of variance.
Several transformation techniques are available.
Data Transformation
Linear Transformation: An important assumption in using the
regression model for forecasting is that the pattern of observation is linear.
Obviously, there are many situations in which this is not a valid assumption.
For example, if we were forecasting monthly sales and it was believed that those sales varied according to the season of the year, then the assumption of linearity would not hold.
Linear Transformation
A forecasting equation may be of the form:
The above could easily be transformed into a linear form for estimation purposes:
u .eY X
uXY logloglog
Logarithmic TransformationSouthwest Airlines Operating Revenue between 1990 and 2006 Operating Revenue Logarithmic Year (Million Dollars) Transformation 1990 1237 3.09
1991 1379 3.14
1992 1803 3.26
…… …… ……
…… …… ……
2003 5937 3.77
2004 6530 3.81
2005 7584 3.88
2006 9086 3.96
Logarithmic Transformation
Time
Ope
rati
ng R
even
ue
0
2,000
4,000
6,000
8,000
10,000
1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
1.00
10.00
Log
of
Ope
rati
ng R
even
ue
Actual Log
Figure 4.2 Actual and Logarithmically Transformed Operating Revenue for Southwest Airlines
Square Root Transformation
Southwest Airlines Operating Revenue for the Years 1990 and 2006 Operating Revenue Square Root Year (Million Dollars) Transformation 1990 1,237 35.17
1991 1,379 37.13
…… …… ……
…… …… ……
2002 5,522 74.31
2003 5,937 77.05
2004 6,530 80.81
2005 7,584 87.09
2006 9,086 95.32
Square Root Transformation(Scaled Square Root Data)
0
2,000
4,000
6,000
8,000
10,000
1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008
Time
0.00
20.00
40.00
60.00
80.00
100.00
120.00Operating Revenue Square Root
Square Root
Operating Revenue
Square Root Transformation(Unscaled Square Root Data)
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Actual
Transformed
Classical Time Series Model
Secular Trend (T )
Seasonal Variation (S )
Cyclical Variation (C )
Random or Irregular Variation
Trend Linear Trend
Non-linear Trend
bxaYt
tbat eY
2cTbTaYt
Trend Computing the Linear Trend
The Freehand Method
The Semi-average Method
The Least Squares Method
Freehand MethodU.S. Private Fixed Investment for the Years 1995–2005 Total Private Fixed Investment Year ($ Billion) 1995 1,112.9
1996 1,209.9
1997 1,317.8
1998 1,438.4
1999 1,558.8
2000 1,679.0
2001 1,646.1
2002 1,570.2
2003 1,649.8
2004 1,830.6
2005 2,036.2
Freehand Method
Since a linear trend by this method is simply an approximation of a straight line equation, we have to determine the intercept and the slope of the line.
bxaYt
xYt 48.612.845ˆ
Based on our data, we have:
Freehand Method
0
200
400
600
800
1000
1200
1400
1600
1 2 3 4 5 6 7 8 9 10
Time
Mil
lion
s of
Dol
lars
Actual
Trend Line
Freehand Method
Now we can use this equation to make a forecast of the trend. For example, the forecast for 2006 would be:
)12(94.839.112,1ˆ tY
DollarsBillion 18.120,22006 Y
Freehand Method
Based on your understanding, what are the pitfalls of using the freehand method?
Simple method but not objective. Why not objective?
The Semi-average Method
Simple but objective method in fitting a trend line.
Divides the data into two equal parts and computes the average for each part.
The computed averages for each part provide two points on a straight line.
The slope of the line is computed by taking the difference between the averages and dividing it by half of the total number of observations.
The Semi-average MethodFitting a Straight Line by the Semi-Average Method to Income from the Export of Durable Goods, 1996–2005 Year Income Semi-total Semi-average Coded Time
1996 394.9 −21997 466.2 −11998 481.2 2,415.1 483.02 01999 503.6 12000 569.2 22001 522.2 32002 491.2 42003 499.8 2,679 535.8 52004 556.1 62005 609.7
The Semi-average Method
We see that the intercept of the line is:483.02
The fitted equation is:
The slope is:
56.105
02.48380.535
b
xYt 56.1002.483ˆ
The Semi-average Method
For the year 2005, the forecast revenue from export of durable goods is:
(7) 10.56 483.02ˆ tY
Billion $556.94ˆ tY
The Least Squares Method Provides the best method of fitting a
trend. The intercept and the slope are
computed as follows:
n
Ya
2x
xYb
The Least Squares Method Using the data from the previous
example, we have:
4.50910
1.094,5a
79.7330
3.571,2b
The Least Squares Method The fitted trend line equation is:
7.79x 509.4Yt ˆ
Note: Since x is measured in a half year, we have to multiply it by two to get the full year.
x = 0 in 2000 ½ 1 x = ½ yearY = Billions of Dollars
The Least Squares Method
To compare the two methods, we note:
Least squares:
Semi-average:
xYt 56.1002.483ˆ
15.58x 509.4Yt ˆ
Nonlinear Trend In many business and economic
environments we observe that the time series does not follow a constant rate of increase or decrease, but follows an increasing or decreasing pattern.
Whenever there is dramatic change in production technology, we expect the trend line not to follow a constant linear pattern.
Nonlinear Trend
A polynomial function best exemplifies business conditions.
2ˆttt cxbxaY
A second-degree parabola provides a good historical description of an increase or decrease per time period.
Nonlinear Trend To solve for the constants a, b, and c in
the previous equation, we use the following simultaneous equations:
2xcnaY
422 xcxaYx
2x
xYb
Nonlinear TrendWorld Carbon Emissions from Fossil Fuel Burning 1982–1994Year Million tonnes
1982 4,960 −6 −29,760 178,560 36 1,2961983 4,947 −5 −24,735 123,675 25 6251984 5,109 −4 −20,436 81,744 16 2561985 5,282 −3 −15,846 47,538 9 811986 5,464 −2 −10,928 21,856 4 161987 5,584 −1 −5,584 5,584 1 11988 5,801 0 0 0 0 01989 5,912 1 5,912 5,921 1 11990 5,941 2 11,882 23,764 4 161991 6,026 3 18,078 54,234 9 811992 5,910 4 23,640 94,560 16 2561993 5,893 5 29,465 147,325 25 6251994 5,925 6 35,550 213,300 36 1,296
72,754 0 17,238 998,061 182 4,550
X Y x xY Yx 2 2x 4x
Nonlinear Trend The data from the table is used to
compute the following:
22410719487395 x.x..,Yt x = 0 in 19881x = one yearY = million tonnes
Logarithmic Trend
When we wish to fit a trend line to percentage rates of change, we use the logarithmic trend line.
This is more prevalent when dealing with economic growth in an environment.
blog xalogYlog t
The logarithmic trend equation is:
Logarithmic Trend
The least squares trend is computed as:
n
Ya
log
log
2
) log ( log
x
Yxb
Logarithmic Trend Example
1990 620.9 2.793 −15 −41.89 2251991 719.1 2.857 −13 −37.13 1691992 849.4 2.929 −11 −32.22 1211993 917.4 2.963 −9 −26.66 811994 1,210.1 3.083 −7 −21.57 491995 1,487.8 3.173 −5 −15.86 251996 1,510.5 3.179 −3 −9.53 91997 1,827.9 3.262 −1 −3.26 11998 1,837.1 3.264 1 3.26 11999 1,949.3 3.290 3 9.86 92000 2,492.0 3.397 5 16.98 252001 2,661.0 3.425 7 23.97 492002 3,256.0 3.513 9 31.61 812003 4,382.28 3.642 11 40.05 1212004 5,933.2 3.773 13 49.05 1692005 7,619.5 3.882 15 58.22 225
52.42 0.00 44.89 1360.0
Chinese ExportsYear ($100 Million) log Y x x log Y
2x
Logarithmic Trend Example (continued)
28.316
42.52 log log
n
Ya
033.01360
89.44) log ( log
2
x
Yxb
Logarithmic Trend Example (continued)
The estimated trend line equation is:
xtY 033.028.3ˆ log
211997 in x 0
year x 211
Logarithmic Trend (continued)
Check the goodness of fit by substituting two data points such as 1992 and 2003, into the fitted equation.
0.033(-11) + 3.28 = Y Log 1992ˆ
For 1992, we will have:
2.917 = Y Log 1992ˆ
Logarithmic Trend (continued)
0.033(11) + 3.28 = ˆ Log 2003Y
For 2003, we will have:
3.643 = ˆ Log 2003Y
Logarithmic Trend
Interpretation of the estimated trend line would be similar to a linear trend. However, before we can interpret the estimated values, we have to convert the log values into actual values of the data points.
This is done by taking the antilog.
Logarithmic Trend
The results are:
917.2 antilog)ˆ(log antilog1992 YY
04.8261992 Y
And
643.3 antilog)ˆ(log antilog2003 YY
42.395,42003 Y
Logarithmic Trend To determine the rate of change or the
slope of the line we have:R = antilog 0.033 = 1.079
Since the rate of change (r) was defined as R −1, then
r = 1.079 −1 = 0.079 r = 7.9 percent per half-year
Therefore the growth rate is 15.8% or 16% per year.
Other Approaches to Trend Line
Two more sophisticated methods of determining whether there is a trend in the data: Differencing Autocorrelation (Box–Jenkins Methodology)
Allows the analyst to see whether a linear equation, a second-degree polynomial, or a higher-degree equation should be used to determine a trend.
Differencing
First Difference
1 ttt YYY
12 ttt YYY
Second Difference
Differencing Method ExampleFirst and Second Difference of Hypothetical
Data
Yt First Difference Second Difference
20,00022,000 2,00024,300 2,300 30026,900 2,600 30029,800 2,900 30033,000 3,200 300
Seasonal Analysis
Seasonal variation is defined as a predictable and repetitive movement observed around a trend line within a period of 1 year or less.
There are several reasons for measuring seasonal variations. When analyzing the data from a time series, it
is important to be able to know how much of the variation in the data is due to the seasonal factors.
Seasonal Variation (Continued)
We may use seasonal variation patterns in making projections or forecasts of a short-term nature.
By eliminating the seasonal variation from a time series, we may discover the cyclical pattern of the time series.
Seasonal VariationComputation of Ratio of Original Data to Moving Average
Passenger Moving Ratio of Original
Year and Enplanement 12-month Moving Data to Moving
Month (Million) Total Average Average, %
(1) (2) (3) (4) (5) 2001 Jan. 44.107
Feb. 43.177
March 53.055
April 50.792
May 51.120
June 53.471
July 55.803 560.359 46.70 119.50
Aug. 56.405 554.809 46.23 122.00
Sept. 30.546 550.277 45.86 66.61
Oct. 40.291 545.723 45.48 88.60
Seasonal Variation To compute a seasonal index, we do the following:
sum the modified means
Months Year Jan. Feb. Mar. Apr. May June July Aug. Sep. Oct. Nov. Dec. 2001 119.5 122.0 66.6 88.6 90.4 91.5
2002 87.0 87.9 111.4 102.5 104.7 108.6 111.1 110.3 86.2 103.2 94.2 105.7
2003 91.3 86.6 104.7 97.7 101.5 107.4 114.7 110.8 90.3 101.2 94.4 99.1
2004 86.7 89.0 105.8 103.5 102.2 109.0 113.6 108.6 90.0 101.6 96.7 97.6
2005 88.6 86.6 108.0 100.4 104.8 109.0 114.0 Mean 87.4 87.1 106.2 100.2 102.8 108.3 112.9 109.9 88.8 102.0 95.1 96.1
Seasonal Variation Mean Middle Seasonal Month Three Ratios Index Jan. 87.42 87.68 Feb. 87.05 87.31 March 106.19 106.50 April 100.19 100.50 May 102.80 103.11 June 108.32 108.64 July 112.91 113.25 Aug. 109.90 110.23 Sep. 88.81 89.07 Oct. 102.00 102.31 Nov. 95.09 95.38 Dec. 96.07 96.36
Seasonal Variation
If production is full, we expect the index to equal 100 for each month. If not, we have to adjust it by computing a correction factor.
Compute the seasonal index.
002.197.1197
1200Factor Correction
Seasonal VariationPartial Deseasonalized Data for Passenger Enplanement, 2001–2005 Deseasonalized Year and Passengers Seasonal Passenger Month (Million) Index Enplanement (1) (2) (3) [col 2 ÷ col 3] X 100 2001 Jan. 44.107 87.68 50.306 Feb. 43.177 87.31 49.452 March 53.055 106.50 49.814 April 50.792 100.50 50.539 ….. ..……. ……… ……. 2005 Sept. 50.776 89.07 57.005 Oct. 53.971 102.31 52.754 Nov. 52.962 95.38 55.530 Dec. 53.007 96.36 55.008
Cyclical Variation
Similar to seasonal variation except that it occurs every 5 to 10 years.
There is a systematic pattern in the data that mirrors what is happening in the economy.
Movements from a recession to a depression or recession to recovery follow a cycle.
Every time series data has a random component. If there were no random components, we would have perfect prediction of future values. However, this is not the case with real-world conditions.
The cyclical component is measured as a proportion of the trend.
Cyclical VariationCitrus Received by the Cooperative during 1994–2006, and the Estimated Trend Boxes of Citrus Year (in 1,000) Trend X Y Yt
1994 6.5 6.7 1995 7.2 7.1 1996 7.6 7.5 1997 8.4 7.9 1998 8.5 8.3 1999 8.0 8.7 2000 8.6 8.9 2001 8.9 8.7 2002 9.5 9.5 2003 10.2 9.9 2004 10.6 10.3 2005 10.8 10.7 2006 11.0 11.1
Cyclical Variation
0
2
4
6
8
10
12
Bo
xes
(in
th
ou
san
ds)
Time
Boxes of Citrus Trend of Boxes of Citrus
Cyclical Fluctuations around the Trend Line
Cyclical VariationCalculation of Percent of Trend Boxes of Citrus Percent of Year (in 1,000) Trend trend X Y Yt (Y/Yt 100) 1994 6.5 6.7 97.0 1995 7.2 7.1 101.4 1996 7.6 7.5 101.3 1997 8.4 7.9 106.3 1998 8.5 8.3 102.4 1999 8.0 8.7 91.9 2000 8.6 8.9 96.6 2001 8.9 8.7 102.3 2002 9.5 9.5 100.0 2003 10.2 9.9 103.0 2004 10.6 10.3 102.9 2005 10.8 10.7 100.9 2006 11.0 11.1 99.1
Cyclical Variation Example
80
85
90
95
100
105
110
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Time
Per
cent
Tre
nd
Percent Trend Linear (Percent Trend)
Chapter Summary
Preliminary Adjustments to Data: Trading Day Adjustment Price Adjustment Population Adjustment
Data Transformation
Patterns in Time Series Data
The Classical Decomposition Method