View
226
Download
0
Category
Preview:
Citation preview
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 1/12
Business Statistics Project
Time series analysis
MsC. Business Administration
Angelo Delle Piane
Ruggero Cardinaletti
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 2/12
Business Statistics Project
Time series: Austria, Residents.
Our project is based on analyzing the time series of Austrian tourism, focusing our attention on the
residents.
We would like to start up by plotting some graphs which will help us understand how does the
series behave, if it has a seasonality and a trend, and whether it’s better to convert the time series
into its logarithm or not.
First of all, let’s take a look at the time series provided:
Time series
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
1 9 9 0 M 0 1
1 9 9 0 M 0 9
1 9 9 1 M 0 5
1 9 9 2 M 0 1
1 9 9 2 M 0 9
1 9 9 3 M 0 5
1 9 9 4 M 0 1
1 9 9 4 M 0 9
1 9 9 5 M 0 5
1 9 9 6 M 0 1
1 9 9 6 M 0 9
1 9 9 7 M 0 5
1 9 9 8 M 0 1
1 9 9 8 M 0 9
1 9 9 9 M 0 5
2 0 0 0 M 0 1
2 0 0 0 M 0 9
2 0 0 1 M 0 5
2 0 0 2 M 0 1
2 0 0 2 M 0 9
2 0 0 3 M 0 5
2 0 0 4 M 0 1
2 0 0 4 M 0 9
2 0 0 5 M 0 5
2 0 0 6 M 0 1
2 0 0 6 M 0 9
2 0 0 7 M 0 5
2 0 0 8 M 0 1
2 0 0 8 M 0 9
2 0 0 9 M 0 5
2 0 1 0 M 0 1
2 0 1 0 M 0 9
Time
A r r i v a
l s
As we can see, from the plot we can identify a seasonal pattern that periodically occurs throughout
the years. From 1995 onwards, we can perceive a slight trend that increases over time as well.
The first issue we’re facing is if transforming the series into logarithm would help us stabilize the
time series itself, and therefore improve our model.
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 3/12
Time series traformed into LOG
5,3
5,4
5,5
5,6
5,7
5,8
5,9
6
6,1
T I M E
n o v - 9 0
o t t - 9 1
s e t - 9 2
a g o - 9 3
l u g - 9 4
g i u - 9 5
m a g -
a p r - 9 7
m a r - 9 8
f e b - 9 9
g e n - 0 0
d i c - 0 0
n o v - 0 1
o t t - 0 2
s e t - 0 3
a g o - 0 4
l u g - 0 5
g i u - 0 6
m a g -
a p r - 0 8
m a r - 0 9
f e b - 1 0
Months
Once we convert the time series the amplitude of seasonal fluctuations seems to be more stable, and
the trend effect, pointed out with the red line, appears reduced. Plotting the annual growth (from
1991 onwards) we have the confirmation that a positive trend exists.
Annual Growth
-2,5
-2
-1,5
-1
-0,5
0
0,5
1
1,5
2
2,5
1 9 9 0 M 0 1
1 9 9 0 M 1 1
1 9 9 1 M 0 9
1 9 9 2 M 0 7
1 9 9 3 M 0 5
1 9 9 4 M 0 3
1 9 9 5 M 0 1
1 9 9 5 M 1 1
1 9 9 6 M 0 9
1 9 9 7 M 0 7
1 9 9 8 M 0 5
1 9 9 9 M 0 3
2 0 0 0 M 0 1
2 0 0 0 M 1 1
2 0 0 1 M 0 9
2 0 0 2 M 0 7
2 0 0 3 M 0 5
2 0 0 4 M 0 3
2 0 0 5 M 0 1
2 0 0 5 M 1 1
2 0 0 6 M 0 9
2 0 0 7 M 0 7
2 0 0 8 M 0 5
2 0 0 9 M 0 3
Months
We don’t know how to explain the drastic decrease in 1998, but we could make an hypothesis on
the lack of growth over 2007/08, probably due to the financial crisis.
Taking into account the transformed time series we could plot the ACF to see how autocorrelated
the time series is.
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 4/12
Autocorrelogram
-1
-0,8
-0,6
-0,4
-0,2
0
0,2
0,4
0,6
0,8
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Lag
A u t o c o r r e l a t i o n
By looking at the ACF, we can identify a clear lag 1 autocorrelation, as well as a seasonality -as
long as we do not take into account the series’ trend, which increases over time-; this provides us
evidence that the time series can be forecasted effectively.
Modelling the time series
We need to find the best model to forecast the monthly arrivals in 2011 in our time series. To do
this, we can predict the 2010 values by using different models and compare the results with the
original data of 2010 at hand. Once we have found the best solution, we shall use it to forecast the
following year arrivals.
The first model we decide to fit to our time series is a time series regression using polynomial trend
and deterministic seasonality. We start by improving our data set, including a constant equal to 1
and seasonal dummy variables for each month, along with t (number of months) and its polynomial
equations. To avoid estimating the model by constrained least squares we reparametrize it in two
ways: by dropping the intercept or, alternatively, by dropping one of the seasonal dummies. We use
the first re parameterization to create our models.
Regression with no intercept
The regression tells us that the fit of the model is quite good since the adjusted R-squared is around
0,99558, so close to one, and the standard error is very small (0,02479).
With the data we retrieve from the regression we calculate the seasonal effect, and plot the graph of
the seasonal pattern:
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 5/12
Seasonal pattern
-0,15
-0,1
-0,05
0
0,05
0,1
0,15
1 2 3 4 5 6 7 8 9 10 11 12
Months
Watching this pattern we can assume that we have a high peak season on the month of August and a
low peak season on December. Since our time series is referred to the tourist arrival of Austrian
residents, we deduce that they prefer to travel around their own country in the summertime rather
than during winter, probably due to the harsh weather conditions.
Prediction with NO intercept
5,2
5,3
5,4
5,5
5,6
5,7
5,8
5,9
6
6,1
1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244
log austria
PredictionNo.int
We see that the model has some problems to predict the series but still behaving quite well.
To improve the model let’s see what happens if we include polynomial trends.
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 6/12
To be brief we will compare the three polynomial regressions we have create by identifying their
best goodness of fit analyzing the AIC and SIC value, the Adjusted R-squared and the Standard
Error.
Quadratic regression - t² Cubic regression -
t³
‘Fourth grade’ – t 4
AIC 0,065716 0,061996 0,058274
SIC 0,000714 0,000722 0,000727
Adjusted R-squared 0,995558192 0,995539 0,995519
Standard error 0,023476152 0,02338 0,023243
At first sight we’d choose the model with the lowest AIC; in this case we should pick the fourth-
grade regression. However, to be certain, we calculate a more effective parameter, SIC, which
suggests a different solution, that is the quadratic regression model (which is also the one with the
highest Adjusted R-squared value).
Let’s plot the model to see how it is able to forecast the series for the year 2010.
Prediction with t^2
5,3
5,4
5,5
5,6
5,7
5,8
5,9
6
6,1
gen-90 set-90 mag-
91
gen-92 set-92 mag-
93
gen-94 set-94 mag-
95
gen-96 set-96 mag-
97
gen-98 set-98 mag-
99
gen-00 set-00 mag-
01
gen-02 set-02 mag-
03
gen-04 set-04 mag-
05
gen-06 set-06 mag-
07
gen-08 set-08 mag-
09
gen-10 set-10
Ypreditc-t 2̂ Austria LOG
h
We can observe that the forecast has improved from the previous model we plotted, but we can still
find discrepancies, mainly during the first and the last years of the time series. We would like to
forecast the series with other models in order to reduce these errors and achieve a better prediction.
ARMA model
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 7/12
Before applying any ARMA or ARIMA models we have first decided to analyze the ACF and
PACF and calculated the Ljung-Box and Box-Pierce test if the data could be assumed as White
noise or not.
Statistic Lag Value p-value
Box-Pierce 6 542.747<
0,0001
Ljung-Box 6 553.109<
0,0001
Box-Pierce 12 1158.448<
0,0001
Ljung-Box 12 1200.443<
0,0001
From the results we can see that both tests agree that the data cannot be assumed to be generated by
a white noise process since both Ljung-Box and Box-Pierce have a very small p-value. .
In order to exploit a more accurate model, we decided to try different ARMA models, by
establishing p and q parameters from (1,1) to (3,3). From the given results, we will choose the one
with the lowest AIC or AICC (correct AIC for finite sample sizes).
p Q P Q AICC
1 1 0 0 -608,943
1 2 0 0 -603,749
1 3 0 0 -626,2902 1 0 0 -607,462
2 2 0 0 -603,694
2 3 0 0 -599,156
3 1 0 0 -603,676
3 2 0 0 -624,412
3 3 0 0 -623,899
As we can see, the model we choose is the ARMA (1,3), since it has the lowest AICC.
Let’s see how this model works with our time series and how effectively can forecast the 2010
values.
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 8/12
ARMA (1,3)
5,3
5,4
5,5
5,6
5,7
5,8
5,9
6
ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14
TIME
L O G A
u s t r i a
LOG Austria ARIMA (LOG Austria) Validation Prediction Low er bound (95%) Upper bound (95%)
It’s clear that the ARMA model, which doesn’t take into account the seasonal effect, tends to make
a prediction that will head back to the mean (the value 5,676).
However, since our time series has seasonality, we should use a model that considers it like
ARIMA.
ARIMA model and AIRLINE
In the ARIMA models we should consider the parameters p, q and d . To begin we’d like to find the
best parameters in a range from (p=0,d=1,q=1,P=0,D=1,Q=1) to (p=2,d=1,q=2,P=1,D=1,Q=1):
p q P Q AICC
0 1 0 1 -1067,410
0 1 1 1 -1063,640
0 2 0 1 -1062,627
0 2 1 1 -1058,948
1 1 0 1 -1062,526
1 1 1 1 -1058,811
1 2 0 1 -1057,313
1 2 1 1 -1053,537
2 1 0 1 -1058,208
2 1 1 1 -1054,955
2 2 0 1 -1052,8042 2 1 1 -1049,574
The results tell us that the parameters we should use for applying the best model are ARIMA
(0,1,1), with both p and P equal to zero, d=D=1 and q=Q=1. This model is known as the AIRLINE
model, and it seems to be the most precise one. Why?
Goodness of fitstatistics:
Observations 228,000
SSE 0,019
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 9/12
MAPE(Diff) 108,693
MAPE 0,112
WN Variance 0,000
WN Variance (estimate) 0,000
-2Log(Like.) -1083,522
FPE 0,000
AIC -1077,522
AICC -1077,408
SBC -1067,410
MSE (mean square error) 0,0000814507
ARIMA (0,1,1) = AIRLINE
5,3
5,4
5,5
5,6
5,7
5,8
5,9
6
6,1
ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14
TIME
L
O G A
u s t r i a
LOG Austria ARIMA (LOG Austria) Validation Prediction
Low er bound (95%) Upper bound (95%) Year 2010 original data
The AIRLINE model’s data provide us with the lowest AIC observed and, most important, the
lowest MSE value (retrieved by dividing the SSE by the degrees of freedom), which explain the
goodness of fit of the model itself; also the graph underlines the same conclusion.
In fact, comparing the prediction for the year 2010 (yellow line) with the original data for the same
year (crossed green line) we realize that this has been the best forecast for the year 2010 that we’ve
done so far. Nevertheless, we want to let this forecast method compete with another model: the
Holt-Winters one.
Seasonal Holt-Winters model
The Holt-Winters model can be elaborated considering the multiplicative or the additive seasonality
of a time series; in our case we’ve chosen the former solution.
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 10/12
Holt-Winters / Seasonal multiplicative (LOG Austria)
5,3
5,4
5,5
5,6
5,7
5,8
5,9
6
6,1
6,2
ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14
TIME
L O G A
u s t r
i a
LOG Austria Holt-Winters(LOG Austria) Validation Prediction
Low er bound (95%) Upper bound (95%) Months of 2010
The plot of this model doesn’t differ so much from the previous one; however, to be certain of the
results, we must take a look at the summary statistics:
Statistic Value
DF 212,000
SSE 0,085
MSE0,000400707
1
RMSE 0,020
MAPE 0,254
MPE -0,019
MAE 0,014
R² 0,969
The Adjusted R-Squared, at a first glance, confirms that the results are quite acceptable, but to
compare the two models we have to use the same coefficient, thus we must look at the MSE for the
Holt-Winters as well.
Seasonal Holt-Winters AIRLINE
MSE 0,0004007071 > MSE 0,0000814507
The AIRLINE model demonstrates that, according to the MSE evaluation, it was able to forecast
the time series for the year 2010 (given all the previous data) in the most reliable way. This is why
we chose to apply it also for forecasting what will happen on 2011.
Forecasting 2011 arrival of resident with AIRLINE
The prediction for year 2011 will be performed using the best model since, this time, we won’t be
able to compare it with the original data (future values). Furthermore, since our data ends up in
November 2010, we will forecast a 12-step-ahead prediction up to November 2011.
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 11/12
AIRLINE for year 2011
5,3
5,4
5,5
5,6
5,7
5,8
5,9
6
6,1
ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14
TIME
L O G A
u s t r i a
LOG Austria ARIMA (LOG Austria) Validation Prediction Low er bound (95%) Upper bound (95%)
Log (Austria)predicted
dic-10 5,790
gen-11 5,819
feb-11 5,807
mar-11 5,852
apr-11 5,795
mag-11 5,883giu-11 5,901
lug-11 5,928
ago-11 5,984
set-11 5,925
ott-11 5,886
nov-11 5,806
Highlighted in green are the data set provided by the AIRLINE simulation, which reflect both the
seasonal pattern and the increasing trend as they’ve been observed throughout the whole time
series.
Conclusion
After analyzing a series of different models, we’ve got to the conclusion that the best model for our
time series is the ARIMA (0,1,1), know as AIRLINE model, because the summary statistics present
a lower mean square error, together with a lower AIC value. Furthermore, compared with the other
models, it was able to plot the most precise graph at eyesight.
In fact:
8/3/2019 Business Statistics Project-Final Version
http://slidepdf.com/reader/full/business-statistics-project-final-version 12/12
ϖ The polynomial regression model elaborates the forecast in a good way, but has some flaws,
especially in correcting the model step by step, and adapting it to changes.
ϖ The ARMA combinations do not take into account the seasonal pattern nor the trend of the
time series since the model itself tends to get back to the mean in the forecast.
ϖ The seasonal Holt-Winters model produces a more accurate forecast than the previous ones,
yet, by observing the statistical data, it is overtaken by the AIRLINE model; even though, in
our opinion, we could have obtain a fairly good prediction with this model too.
Recommended