12
Business Statistics Project  Time series analysis MsC. Business Administration Angelo Delle Piane Ruggero Cardinaletti

Business Statistics Project-Final Version

Embed Size (px)

Citation preview

Page 1: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 1/12

Business Statistics Project

 Time series analysis

MsC. Business Administration

Angelo Delle Piane

Ruggero Cardinaletti

Page 2: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 2/12

Business Statistics Project

 Time series: Austria, Residents.

Our project is based on analyzing the time series of Austrian tourism, focusing our attention on the

residents.

We would like to start up by plotting some graphs which will help us understand how does the

series behave, if it has a seasonality and a trend, and whether it’s better to convert the time series

into its logarithm or not.

First of all, let’s take a look at the time series provided:

Time series

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

     1     9     9     0     M     0     1

     1     9     9     0     M     0     9

     1     9     9     1     M     0     5

     1     9     9     2     M     0     1

     1     9     9     2     M     0     9

     1     9     9     3     M     0     5

     1     9     9     4     M     0     1

     1     9     9     4     M     0     9

     1     9     9     5     M     0     5

     1     9     9     6     M     0     1

     1     9     9     6     M     0     9

     1     9     9     7     M     0     5

     1     9     9     8     M     0     1

     1     9     9     8     M     0     9

     1     9     9     9     M     0     5

     2     0     0     0     M     0     1

     2     0     0     0     M     0     9

     2     0     0     1     M     0     5

     2     0     0     2     M     0     1

     2     0     0     2     M     0     9

     2     0     0     3     M     0     5

     2     0     0     4     M     0     1

     2     0     0     4     M     0     9

     2     0     0     5     M     0     5

     2     0     0     6     M     0     1

     2     0     0     6     M     0     9

     2     0     0     7     M     0     5

     2     0     0     8     M     0     1

     2     0     0     8     M     0     9

     2     0     0     9     M     0     5

     2     0     1     0     M     0     1

     2     0     1     0     M     0     9

Time

     A    r    r     i    v    a

     l    s

As we can see, from the plot we can identify a seasonal pattern that periodically occurs throughout

the years. From 1995 onwards, we can perceive a slight trend that increases over time as well.

The first issue we’re facing is if transforming the series into logarithm would help us stabilize the

time series itself, and therefore improve our model.

Page 3: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 3/12

Time series traformed into LOG

5,3

5,4

5,5

5,6

5,7

5,8

5,9

6

6,1

    T    I    M    E

   n   o   v  -    9    0

   o    t    t  -    9    1

   s   e    t  -    9    2

   a   g   o  -    9    3

    l   u   g  -    9    4

   g    i   u  -    9    5

   m   a   g  -

   a   p   r  -    9    7

   m   a   r  -    9    8

    f   e    b  -    9    9

   g   e   n  -    0    0

    d    i   c  -    0    0

   n   o   v  -    0    1

   o    t    t  -    0    2

   s   e    t  -    0    3

   a   g   o  -    0    4

    l   u   g  -    0    5

   g    i   u  -    0    6

   m   a   g  -

   a   p   r  -    0    8

   m   a   r  -    0    9

    f   e    b  -    1    0

Months

Once we convert the time series the amplitude of seasonal fluctuations seems to be more stable, and

the trend effect, pointed out with the red line, appears reduced. Plotting the annual growth (from

1991 onwards) we have the confirmation that a positive trend exists.

Annual Growth

-2,5

-2

-1,5

-1

-0,5

0

0,5

1

1,5

2

2,5

    1    9    9    0    M    0    1

    1    9    9    0    M    1    1

    1    9    9    1    M    0    9

    1    9    9    2    M    0    7

    1    9    9    3    M    0    5

    1    9    9    4    M    0    3

    1    9    9    5    M    0    1

    1    9    9    5    M    1    1

    1    9    9    6    M    0    9

    1    9    9    7    M    0    7

    1    9    9    8    M    0    5

    1    9    9    9    M    0    3

    2    0    0    0    M    0    1

    2    0    0    0    M    1    1

    2    0    0    1    M    0    9

    2    0    0    2    M    0    7

    2    0    0    3    M    0    5

    2    0    0    4    M    0    3

    2    0    0    5    M    0    1

    2    0    0    5    M    1    1

    2    0    0    6    M    0    9

    2    0    0    7    M    0    7

    2    0    0    8    M    0    5

    2    0    0    9    M    0    3

Months

We don’t know how to explain the drastic decrease in 1998, but we could make an hypothesis on

the lack of growth over 2007/08, probably due to the financial crisis.

Taking into account the transformed time series we could plot the ACF to see how autocorrelated

the time series is.

Page 4: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 4/12

Autocorrelogram

-1

-0,8

-0,6

-0,4

-0,2

0

0,2

0,4

0,6

0,8

1

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Lag

     A    u     t    o    c    o    r    r    e     l    a     t     i    o    n

By looking at the ACF, we can identify a clear lag 1 autocorrelation, as well as a seasonality -as

long as we do not take into account the series’ trend, which increases over time-; this provides us

evidence that the time series can be forecasted effectively.

Modelling the time series

We need to find the best model to forecast the monthly arrivals in 2011 in our time series. To do

this, we can predict the 2010 values by using different models and compare the results with the

original data of 2010 at hand. Once we have found the best solution, we shall use it to forecast the

following year arrivals.

The first model we decide to fit to our time series is a time series regression using polynomial trend

and deterministic seasonality. We start by improving our data set, including a constant equal to 1

and seasonal dummy variables for each month, along with t (number of months) and its polynomial

equations. To avoid estimating the model by constrained least squares we reparametrize it in two

ways: by dropping the intercept or, alternatively, by dropping one of the seasonal dummies. We use

the first re parameterization to create our models.

Regression with no intercept

The regression tells us that the fit of the model is quite good since the adjusted R-squared is around

0,99558, so close to one, and the standard error is very small (0,02479).

With the data we retrieve from the regression we calculate the seasonal effect, and plot the graph of 

the seasonal pattern:

Page 5: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 5/12

Seasonal pattern

-0,15

-0,1

-0,05

0

0,05

0,1

0,15

1 2 3 4 5 6 7 8 9 10 11 12

Months

Watching this pattern we can assume that we have a high peak season on the month of August and a

low peak season on December. Since our time series is referred to the tourist arrival of Austrian

residents, we deduce that they prefer to travel around their own country in the summertime rather 

than during winter, probably due to the harsh weather conditions.

Prediction with NO intercept

5,2

5,3

5,4

5,5

5,6

5,7

5,8

5,9

6

6,1

1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244

log austria

PredictionNo.int

We see that the model has some problems to predict the series but still behaving quite well.

To improve the model let’s see what happens if we include polynomial trends.

Page 6: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 6/12

To be brief we will compare the three polynomial regressions we have create by identifying their 

 best goodness of fit analyzing the AIC and SIC value, the Adjusted R-squared and the Standard

Error.

Quadratic regression - t²  Cubic regression -

t³ 

‘Fourth grade’ – t 4 

AIC 0,065716 0,061996 0,058274

SIC 0,000714 0,000722 0,000727

Adjusted R-squared 0,995558192 0,995539 0,995519

Standard error 0,023476152 0,02338 0,023243

At first sight we’d choose the model with the lowest AIC; in this case we should pick the fourth-

grade regression. However, to be certain, we calculate a more effective parameter, SIC, which

suggests a different solution, that is the quadratic regression model (which is also the one with the

highest Adjusted R-squared value).

Let’s plot the model to see how it is able to forecast the series for the year 2010.

Prediction with t^2

5,3

5,4

5,5

5,6

5,7

5,8

5,9

6

6,1

gen-90 set-90 mag-

91

gen-92 set-92 mag-

93

gen-94 set-94 mag-

95

gen-96 set-96 mag-

97

gen-98 set-98 mag-

99

gen-00 set-00 mag-

01

gen-02 set-02 mag-

03

gen-04 set-04 mag-

05

gen-06 set-06 mag-

07

gen-08 set-08 mag-

09

gen-10 set-10

Ypreditc-t 2̂ Austria LOG

h

We can observe that the forecast has improved from the previous model we plotted, but we can still

find discrepancies, mainly during the first and the last years of the time series. We would like to

forecast the series with other models in order to reduce these errors and achieve a better prediction.

ARMA model

Page 7: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 7/12

Before applying any ARMA or ARIMA models we have first decided to analyze the ACF and

PACF and calculated the Ljung-Box and Box-Pierce test if the data could be assumed as White

noise or not.

Statistic Lag Value p-value

Box-Pierce 6 542.747<

0,0001

Ljung-Box 6 553.109<

0,0001

Box-Pierce 12 1158.448<

0,0001

Ljung-Box 12 1200.443<

0,0001

From the results we can see that both tests agree that the data cannot be assumed to be generated by

a white noise process since both Ljung-Box and Box-Pierce have a very small p-value. .

In order to exploit a more accurate model, we decided to try different ARMA models, by

establishing p and q parameters from (1,1) to (3,3). From the given results, we will choose the one

with the lowest AIC or AICC (correct AIC for finite sample sizes).

p Q P Q AICC

1 1 0 0 -608,943

1 2 0 0 -603,749

1 3 0 0 -626,2902 1 0 0 -607,462

2 2 0 0 -603,694

2 3 0 0 -599,156

3 1 0 0 -603,676

3 2 0 0 -624,412

3 3 0 0 -623,899

As we can see, the model we choose is the ARMA (1,3), since it has the lowest AICC.

Let’s see how this model works with our time series and how effectively can forecast the 2010

values.

Page 8: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 8/12

ARMA (1,3)

5,3

5,4

5,5

5,6

5,7

5,8

5,9

6

ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14

TIME

   L   O   G    A

  u  s   t  r   i  a

LOG Austria ARIMA (LOG Austria) Validation Prediction Low er bound (95%) Upper bound (95%)

It’s clear that the ARMA model, which doesn’t take into account the seasonal effect, tends to make

a prediction that will head back to the mean (the value 5,676).

However, since our time series has seasonality, we should use a model that considers it like

ARIMA.

ARIMA model and AIRLINE

In the ARIMA models we should consider the parameters p, q and d . To begin we’d like to find the

 best parameters in a range from (p=0,d=1,q=1,P=0,D=1,Q=1) to (p=2,d=1,q=2,P=1,D=1,Q=1):

p q P Q AICC

0 1 0 1 -1067,410

0 1 1 1 -1063,640

0 2 0 1 -1062,627

0 2 1 1 -1058,948

1 1 0 1 -1062,526

1 1 1 1 -1058,811

1 2 0 1 -1057,313

1 2 1 1 -1053,537

2 1 0 1 -1058,208

2 1 1 1 -1054,955

2 2 0 1 -1052,8042 2 1 1 -1049,574

The results tell us that the parameters we should use for applying the best model are ARIMA

(0,1,1), with both p and P equal to zero, d=D=1 and q=Q=1. This model is known as the AIRLINE

model, and it seems to be the most precise one. Why?

Goodness of fitstatistics:

Observations 228,000

SSE 0,019

Page 9: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 9/12

MAPE(Diff) 108,693

MAPE 0,112

WN Variance 0,000

WN Variance (estimate) 0,000

-2Log(Like.) -1083,522

FPE 0,000

AIC -1077,522

AICC -1077,408

SBC -1067,410

MSE (mean square error) 0,0000814507 

ARIMA (0,1,1) = AIRLINE

5,3

5,4

5,5

5,6

5,7

5,8

5,9

6

6,1

ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14

TIME

   L

   O   G    A

  u  s   t  r   i  a

LOG Austria ARIMA (LOG Austria) Validation Prediction

Low er bound (95%) Upper bound (95%) Year 2010 original data

The AIRLINE model’s data provide us with the lowest AIC observed and, most important, the

lowest MSE value (retrieved by dividing the SSE by the degrees of freedom), which explain the

goodness of fit of the model itself; also the graph underlines the same conclusion.

In fact, comparing the prediction for the year 2010 (yellow line) with the original data for the same

year (crossed green line) we realize that this has been the best forecast for the year 2010 that we’ve

done so far. Nevertheless, we want to let this forecast method compete with another model: the

Holt-Winters one.

Seasonal Holt-Winters model

The Holt-Winters model can be elaborated considering the multiplicative or the additive seasonality

of a time series; in our case we’ve chosen the former solution.

Page 10: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 10/12

Holt-Winters / Seasonal multiplicative (LOG Austria)

5,3

5,4

5,5

5,6

5,7

5,8

5,9

6

6,1

6,2

ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14

TIME

   L   O   G    A

  u  s   t  r

   i  a

LOG Austria Holt-Winters(LOG Austria) Validation Prediction

Low er bound (95%) Upper bound (95%) Months of 2010

The plot of this model doesn’t differ so much from the previous one; however, to be certain of the

results, we must take a look at the summary statistics:

Statistic Value

DF 212,000

SSE 0,085

MSE0,000400707 

1

RMSE 0,020

MAPE 0,254

MPE -0,019

MAE 0,014

R² 0,969

The Adjusted R-Squared, at a first glance, confirms that the results are quite acceptable, but to

compare the two models we have to use the same coefficient, thus we must look at the MSE for the

Holt-Winters as well.

Seasonal Holt-Winters AIRLINE

MSE 0,0004007071 > MSE 0,0000814507 

The AIRLINE model demonstrates that, according to the MSE evaluation, it was able to forecast

the time series for the year 2010 (given all the previous data) in the most reliable way. This is why

we chose to apply it also for forecasting what will happen on 2011.

Forecasting 2011 arrival of resident with AIRLINE

The prediction for year 2011 will be performed using the best model since, this time, we won’t be

able to compare it with the original data (future values). Furthermore, since our data ends up in

 November 2010, we will forecast a 12-step-ahead prediction up to November 2011.

Page 11: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 11/12

AIRLINE for year 2011

5,3

5,4

5,5

5,6

5,7

5,8

5,9

6

6,1

ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-12 dic-14

TIME

   L   O   G    A

   u   s   t   r   i   a

LOG Austria ARIMA (LOG Austria) Validation Prediction Low er bound (95%) Upper bound (95%)

Log (Austria)predicted

dic-10 5,790

gen-11 5,819

feb-11 5,807

mar-11 5,852

apr-11 5,795

mag-11 5,883giu-11 5,901

lug-11 5,928

ago-11 5,984

set-11 5,925

ott-11 5,886

nov-11 5,806

Highlighted in green are the data set provided by the AIRLINE simulation, which reflect both the

seasonal pattern and the increasing trend as they’ve been observed throughout the whole time

series.

Conclusion

After analyzing a series of different models, we’ve got to the conclusion that the best model for our 

time series is the ARIMA (0,1,1), know as AIRLINE model, because the summary statistics present

a lower mean square error, together with a lower AIC value. Furthermore, compared with the other 

models, it was able to plot the most precise graph at eyesight.

In fact:

Page 12: Business Statistics Project-Final Version

8/3/2019 Business Statistics Project-Final Version

http://slidepdf.com/reader/full/business-statistics-project-final-version 12/12

ϖ The polynomial regression model elaborates the forecast in a good way, but has some flaws,

especially in correcting the model step by step, and adapting it to changes.

ϖ The ARMA combinations do not take into account the seasonal pattern nor the trend of the

time series since the model itself tends to get back to the mean in the forecast.

ϖ The seasonal Holt-Winters model produces a more accurate forecast than the previous ones,

yet, by observing the statistical data, it is overtaken by the AIRLINE model; even though, in

our opinion, we could have obtain a fairly good prediction with this model too.