72
Cross-validation aggregation for forecasting www.lancs.ac.uk Devon K. Barrow Sven F. Crone

Euro 2013 barrow crone - slideshare

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Euro 2013 barrow crone - slideshare

Cross-validation aggregation for forecasting

www.lancs.ac.uk

Devon K. BarrowSven F. Crone

Page 2: Euro 2013 barrow crone - slideshare

1. Motivation2. Cross-validation and model selection3. Cross-validation aggregation4. Empirical evaluation5. Conclusions and future work

Outline

Cross validation aggregation for forecasting Motivation 1

Page 3: Euro 2013 barrow crone - slideshare

• Scenario:– The statistician constructs a model and wishes to estimate the error

rate of this model when used to predict future values

Motivation

Cross validation aggregation for forecasting Motivation 2

Page 4: Euro 2013 barrow crone - slideshare

• Scenario:– The statistician constructs a model and wishes to estimate the error

rate of this model when used to predict future valuesBootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Page 5: Euro 2013 barrow crone - slideshare

• Scenario:– The statistician constructs a model and wishes to estimate the error

rate of this model when used to predict future valuesBootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set.

Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)

Page 6: Euro 2013 barrow crone - slideshare

• Scenario:– The statistician constructs a model and wishes to estimate the error

rate of this model when used to predict future valuesBootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set.

Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)

Properties Low variance but is downward biased (Efron and Tibshirani, 1997)

Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997)

Page 7: Euro 2013 barrow crone - slideshare

• Scenario:– The statistician constructs a model and wishes to estimate the error

rate of this model when used to predict future valuesBootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set.

Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)

Properties Low variance but is downward biased (Efron and Tibshirani, 1997)

Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997)

1996 - Breiman introduces bootstrapping and aggregation

Page 8: Euro 2013 barrow crone - slideshare

• Scenario:– The statistician constructs a model and wishes to estimate the error

rate of this model when used to predict future valuesBootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set.

Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)

Properties Low variance but is downward biased (Efron and Tibshirani, 1997)

Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997)

Forecast aggregation

Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples

Page 9: Euro 2013 barrow crone - slideshare

•–

(a) Published items in each year (b) Citations in Each Year

Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set.

Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)

Properties Low variance but is downward biased (Efron and Tibshirani, 1997)

Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997)

Forecast aggregation

Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples

Bagging for time series forecasting:• Forecasting with many

predictors (Watson 2005)

• Macro-economic time series e.g. consumer price inflation (Inoue & Kilian 2008)

• Volatility prediction (Hillebrand & M. C. Medeiros 2010)

• Small datasets – few observations (Langella 2010)

• With other approaches e.g. feature selection – PCA (Lin and

Zhu 2007)

Citation results for publications on bagging for time series

Page 10: Euro 2013 barrow crone - slideshare

•–

Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set.

Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)

Properties Low variance but is downward biased (Efron and Tibshirani, 1997)

Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997)

Forecast aggregation

Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples

Research gap:In contrast to bootstrapping, cross-validation has not been used for forecasts aggregation

Page 11: Euro 2013 barrow crone - slideshare

•–

Bootstrapping (Efron ,1979) Cross validation (Stone, 1974)

Goal Estimating generalisation error Estimating generalisation error

Motivation

Cross validation aggregation for forecasting Motivation 2

Procedure Random sampling with replacement from a single learning set (bootstrap samples). The validation set is the same as the original learning set.

Splits the data into mutually exclusive subsets, using one subset as a set to train each model, and the remaining part as a validation sample (Arlot & Celisse, 2010)

Properties Low variance but is downward biased (Efron and Tibshirani, 1997)

Generalization error estimate is nearly unbiased but can be highly variable (Efron and Tibshirani, 1997)

Research contribution: We propose to combine the benefits of cross-validation and forecast aggregation – Crogging

Forecast aggregation

Bagging (Breiman 1996) – aggregates the outputs of models trained on bootstrap samples

Research gap:In contrast to bootstrapping, cross-validation has not been used for forecasts aggregation

Page 12: Euro 2013 barrow crone - slideshare

Motivation: The Bagging algorithm

Cross validation aggregation for forecasting Motivation 3

• Inputs: learning set • Selection the number of bootstraps = 

   

 

NN yyyS ,x,...,,x,,x 2211K

Page 13: Euro 2013 barrow crone - slideshare

Motivation: The Bagging algorithm

Cross validation aggregation for forecasting Motivation 3

• Inputs: learning set • Selection the number of bootstraps =• For i=1 to K {

– Generate a bootstrap sample using (your favorite bootstrap method) 

 

SkS

NN yyyS ,x,...,,x,,x 2211K

Page 14: Euro 2013 barrow crone - slideshare

Motivation: The Bagging algorithm

Cross validation aggregation for forecasting Motivation 3

• Inputs: learning set • Selection the number of bootstraps =• For i=1 to K {

– Generate a bootstrap sample using (your favorite bootstrap method)– Using training set estimate a model such that }

  xˆ km iik ym xˆ

SkS

kS

NN yyyS ,x,...,,x,,x 2211K

Page 15: Euro 2013 barrow crone - slideshare

Motivation: The Bagging algorithm

Cross validation aggregation for forecasting Motivation 3

• Inputs: learning set • Selection the number of bootstraps =• For i=1 to K {

– Generate a bootstrap sample using (your favorite bootstrap method)– Using training set estimate a model such that }

  xˆ km iik ym xˆ

SkS

kS

NN yyyS ,x,...,,x,,x 2211K

Page 16: Euro 2013 barrow crone - slideshare

Motivation: The Bagging algorithm

Cross validation aggregation for forecasting Motivation 3

• Inputs: learning set • Selection the number of bootstraps =• For i=1 to K {

– Generate a bootstrap sample using (your favorite bootstrap method)– Using training set estimate a model such that }

  xˆ km iik ym xˆ

SkS

kS

NN yyyS ,x,...,,x,,x 2211K

Page 17: Euro 2013 barrow crone - slideshare

Motivation: The Bagging algorithm

Cross validation aggregation for forecasting Motivation 3

• Inputs: learning set • Selection the number of bootstraps =• For i=1 to K {

– Generate a bootstrap sample using (your favorite bootstrap method)– Using training set estimate a model such that }

• Combine model to obtain: xˆ km iik ym xˆ

K

kkmK

M1

xˆ1

SkS

kS

NN yyyS ,x,...,,x,,x 2211K

Page 18: Euro 2013 barrow crone - slideshare

1. Motivation2. Cross-validation and model selection3. Cross-validation aggregation4. Empirical evaluation5. Conclusions and future work

Outline

Cross validation aggregation for forecasting Cross-validation 4

Page 19: Euro 2013 barrow crone - slideshare

• Cross validation is a widely used strategy:– Estimating the predictive accuracy of a model– Performing model selection e.g.:

• Choosing among variables in a regression or the degrees of freedom of a nonparametric model (selection for identification)

• Parameter estimation and tuning (selection for estimation)

       

Cross validation aggregation for forecasting Cross-validation 5

Cross-validation: Background

Page 20: Euro 2013 barrow crone - slideshare

• Cross validation is a widely used strategy:– Estimating the predictive accuracy of a model– Performing model selection e.g.:

• Choosing among variables in a regression or the degrees of freedom of a nonparametric model (selection for identification)

• Parameter estimation and tuning (selection for estimation)

• Main features:– Main idea: test the model on data not used in estimation– Split data once or several times– Part of data is used for training each model (the training

sample), and the remaining part is used for estimating the prediction error of the model (the validation sample)

Cross validation aggregation for forecasting Cross-validation 5

Cross-validation: Background

Page 21: Euro 2013 barrow crone - slideshare

• K-fold cross-validation:

Cross-validation: How it works?

Page 22: Euro 2013 barrow crone - slideshare

• K-fold cross-validation:Sample 1 Sample 2 Sample K-1 Sample K

K samples (one or more observations)

Cross-validation: How it works?

Page 23: Euro 2013 barrow crone - slideshare

• K-fold cross-validation:Sample 1 Sample 2 Sample K-1 Sample K

Estimation ValidationK samples (one or more observations)

Cross-validation: How it works?

Page 24: Euro 2013 barrow crone - slideshare

• K-fold cross-validation:Sample 1 Sample 2 Sample K-1 Sample K

Estimation ValidationK samples (one or more observations)

Cross-validation: How it works?

Page 25: Euro 2013 barrow crone - slideshare

• K-fold cross-validation:Sample 1 Sample 2 Sample K-1 Sample K

Estimation ValidationK samples (one or more observations)

Cross-validation: How it works?

Page 26: Euro 2013 barrow crone - slideshare

• K-fold cross-validation:Sample 1 Sample 2 Sample K-1 Sample K

Estimation ValidationK samples (one or more observations)

Cross-validation: How it works?

Page 27: Euro 2013 barrow crone - slideshare

• K-fold cross-validation:Sample 1 Sample 2 Sample K-1 Sample K

Estimation Validation

K

ti m e s

K samples (one or more observations)

Cross-validation: How it works?

Page 28: Euro 2013 barrow crone - slideshare

• k-fold cross-validation– Divides the data into k none-overlapping and mutually exclusive

sub-samples of approximately equal size.   

   

 

   

Cross-validation strategies

Cross validation aggregation for forecasting Cross-validation aggregation 7

Page 29: Euro 2013 barrow crone - slideshare

• k-fold cross-validation– Divides the data into k none-overlapping and mutually exclusive

sub-samples of approximately equal size.– If k=2, 2-Fold cross validation– If k=10, 10-Fold cross validation

   

 

   

Cross-validation strategies

Cross validation aggregation for forecasting Cross-validation aggregation 7

Page 30: Euro 2013 barrow crone - slideshare

• k-fold cross-validation– Divides the data into k none-overlapping and mutually exclusive

sub-samples of approximately equal size.– If k=2, 2-Fold cross validation– If k=10, 10-Fold cross validation

• If k=N, Leave-one-out cross-validation (LOOCV) 

 

   

Cross-validation strategies

Cross validation aggregation for forecasting Cross-validation aggregation 7

Page 31: Euro 2013 barrow crone - slideshare

• k-fold cross-validation– Divides the data into k none-overlapping and mutually exclusive

sub-samples of approximately equal size.– If k=2, 2-Fold cross validation– If k=10, 10-Fold cross validation

• If k=N, Leave-one-out cross-validation (LOOCV)• Monte-carlo cross-validation

– Randomly split the data into two sub-samples (training and validation) multiple times, each time randomly drawing without replacement

   

Cross-validation strategies

Cross validation aggregation for forecasting Cross-validation aggregation 7

Page 32: Euro 2013 barrow crone - slideshare

• k-fold cross-validation– Divides the data into k none-overlapping and mutually exclusive

sub-samples of approximately equal size.– If k=2, 2-Fold cross validation– If k=10, 10-Fold cross validation

• If k=N, Leave-one-out cross-validation (LOOCV)• Monte-carlo cross-validation

– Randomly split the data into two sub-samples (training and validation) multiple times, each time randomly drawing without replacement

• Hold-out method– A single split into two data sub-samples

Cross-validation strategies

Cross validation aggregation for forecasting Cross-validation aggregation 7

Page 33: Euro 2013 barrow crone - slideshare

• Goal: select a model having the smallest generalisation error 

 

 

Cross validation: model selection

Cross validation aggregation for forecasting Cross-validation 8

Page 34: Euro 2013 barrow crone - slideshare

• Goal: select a model having the smallest generalisation error• Compute an approximation of the generalisation error

defined as follows:

 

 

N

i

ii

Ngen N

mymE

1

2xˆlim

Cross validation: model selection

Cross validation aggregation for forecasting Cross-validation 8

Page 35: Euro 2013 barrow crone - slideshare

• Goal: select a model having the smallest generalisation error• Compute an approximation of the generalisation error defined

as follows:

• Estimate model m on the training set, and calculate the error on the validation set for sample k is:

 

N

i

ii

Ngen N

mymE

1

2xˆlim

KN

mymE

KN

i

vali

vali

k

1

2xˆ

Cross validation: model selection

Cross validation aggregation for forecasting Cross-validation 8

Page 36: Euro 2013 barrow crone - slideshare

• Goal: select a model having the smallest generalisation error• Compute an approximation of the generalisation error defined

as follows:

• Estimate model m on the training set, and calculate the error on the validation set for sample k is:

• Estimate the generalisation error after K repetitions as the average error across all repetitions:

N

i

ii

Ngen N

mymE

1

2xˆlim

KN

mymE

KN

i

vali

vali

k

1

2xˆ

K

mEmE

K

kk

gen

Cross validation: model selection

Cross validation aggregation for forecasting Cross-validation 8

Page 37: Euro 2013 barrow crone - slideshare

• Goal: select a model having the smallest generalisation error• Compute an approximation of the generalisation error defined

as follows:

• Estimate model m on the training set, and calculate the error on the validation set for sample k is:

• Estimate the generalisation error after K repetitions as the average error across all repetitions:

N

i

ii

Ngen N

mymE

1

2xˆlim

KN

mymE

KN

i

vali

vali

k

1

2xˆ

K

mEmE

K

kk

gen

Cross validation: model selection

Cross validation aggregation for forecasting Cross-validation 8

Select the model with the smallest generalisation error

Page 38: Euro 2013 barrow crone - slideshare

• Goal: select a model having the smallest generalisation error• Compute an approximation of the generalisation error defined

as follows:

• Estimate model m on the training set, and calculate the error on the validation set for sample k is:

• Estimate the generalisation error after K repetitions as the average error across all repetitions:

N

i

ii

Ngen N

mymE

1

2xˆlim

KN

mymE

KN

i

vali

vali

k

1

2xˆ

K

mEmE

K

kk

gen

What about the K models estimated on the different data sets?

Cross validation: model selection

Cross validation aggregation for forecasting Cross-validation 8

Select the model with the smallest generalisation error

Page 39: Euro 2013 barrow crone - slideshare

• Goal: select a model having the smallest generalisation error• Compute an approximation of the generalisation error defined

as follows:

• Estimate model m on the training set, and calculate the error on the validation set for sample k is:

• Estimate the generalisation error after K repetitions as the average error across all repetitions:

N

i

ii

Ngen N

mymE

1

2xˆlim

KN

mymE

KN

i

vali

vali

k

1

2xˆ

K

mEmE

K

kk

gen

What about the K models estimated on the different data sets?

Cross validation: model selection

Cross validation aggregation for forecasting Cross-validation 8

Select the model with the smallest generalisation error

Page 40: Euro 2013 barrow crone - slideshare

1. Motivation2. Cross-validation and model selection3. Cross-validation aggregation4. Empirical evaluation5. Conclusions and future work

Outline

Cross validation aggregation for forecasting Cross-validation aggregation 9

Page 41: Euro 2013 barrow crone - slideshare

• In model selection, the model obtained is the one built on all the data (no data reserved for validation)– However predictive accuracy is adjudged on models built on different parts of

the data– These supplementary models are thrown away after they have served their

purpose 

   

 

 

 

Cross-validation aggregation: Crogging

Cross validation aggregation for forecasting Cross-validation aggregation 10

Page 42: Euro 2013 barrow crone - slideshare

• In model selection, the model obtained is the one built on all the data (no data reserved for validation)– However predictive accuracy is adjudged on models built on different parts of

the data– These supplementary models are thrown away after they have served their

purpose• The proposed approach:

   

 

 

 

Cross-validation aggregation: Crogging

Cross validation aggregation for forecasting Cross-validation aggregation 10

Page 43: Euro 2013 barrow crone - slideshare

• In model selection, the model obtained is the one built on all the data (no data reserved for validation)– However predictive accuracy is adjudged on models built on different parts of

the data– These supplementary models are thrown away after they have served their

purpose• The proposed approach:

– We save the predictions made by the K estimated models  

 

 

 

Cross-validation aggregation: Crogging

Cross validation aggregation for forecasting Cross-validation aggregation 10

Page 44: Euro 2013 barrow crone - slideshare

• In model selection, the model obtained is the one built on all the data (no data reserved for validation)– However predictive accuracy is adjudged on models built on different parts of

the data– These supplementary models are thrown away after they have served their

purpose• The proposed approach:

– We save the predictions made by the K estimated models – This gives us a prediction for every observation in the training sample derived

from a model that was built when that observation was in the validation sample

 

 

 

Cross-validation aggregation: Crogging

Cross validation aggregation for forecasting Cross-validation aggregation 10

Page 45: Euro 2013 barrow crone - slideshare

• In model selection, the model obtained is the one built on all the data (no data reserved for validation)– However predictive accuracy is adjudged on models built on different parts of

the data– These supplementary models are thrown away after they have served their

purpose• The proposed approach:

– We save the predictions made by the K estimated models – This gives us a prediction for every observation in the training sample derived

from a model that was built when that observation was in the validation sample

– We then average across the predictions from the K models to produce a final prediction.

 

 

K

ktkt m

KM

1

xˆ1

Cross-validation aggregation: Crogging

Cross validation aggregation for forecasting Cross-validation aggregation 10

Page 46: Euro 2013 barrow crone - slideshare

• In model selection, the model obtained is the one built on all the data (no data reserved for validation)– However predictive accuracy is adjudged on models built on different parts of

the data– These supplementary models are thrown away after they have served their

purpose• The proposed approach:

– We save the predictions made by the K estimated models – This gives us a prediction for every observation in the training sample derived

from a model that was built when that observation was in the validation sample

– We then average across the predictions from the K models to produce a final prediction.

– In the case of neural networks, we also use the validation samples for early stop training

 

K

ktkt m

KM

1

xˆ1

Cross-validation aggregation: Crogging

Cross validation aggregation for forecasting Cross-validation aggregation 10

Page 47: Euro 2013 barrow crone - slideshare

• In model selection, the model obtained is the one built on all the data (no data reserved for validation)– However predictive accuracy is adjudged on models built on different parts of

the data– These supplementary models are thrown away after they have served their

purpose• The proposed approach:

– We save the predictions made by the K estimated models – This gives us a prediction for every observation in the training sample derived

from a model that was built when that observation was in the validation sample

– We then average across the predictions from the K models to produce a final prediction.

– In the case of neural networks, we also use the validation samples for early stop training

– We average across multiple initialisations together with cross validation aggregation (to reduce variance)

K

ktkt m

KM

1

xˆ1

Cross-validation aggregation: Crogging

Cross validation aggregation for forecasting Cross-validation aggregation 10

Page 48: Euro 2013 barrow crone - slideshare

1. Motivation2. Cross-validation and model selection3. Cross-validation aggregation4. Empirical evaluation5. Conclusions and future work

Outline

Cross validation aggregation for forecasting Empirical evaluation 11

Page 49: Euro 2013 barrow crone - slideshare

Complete DatasetReduced Dataset

Short Long Normal Difficult SUM

Non-Seasonal 25(NS)

25(NL)

4(NN)

3(ND) 57

Seasonal 25(SS)

25(SL)

4(SN) - 54

SUM 50 50 8 3 111

Summary description of NN3 competition time series dataset

Evaluation: Design and implementation

Cross validation aggregation for forecasting Empirical evaluation 12

•Time series data• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and

Nikolopoulos 2011)

         

     

Page 50: Euro 2013 barrow crone - slideshare

20 40 60 80 100 120 1404000

5000

6000NN3_101

20 40 60 80 100 120 1400

5000

10000NN3_102

20 40 60 80 100 120 1400

5

10x 10

4 NN3_103

20 40 60 80 100 1200

5000

10000NN3_104

20 40 60 80 100 120 1402000

4000

6000NN3_105

20 40 60 80 100 120 1400

5000

10000NN3_106

20 40 60 80 100 120 1403000

4000

5000NN3_107

20 40 60 80 100 1200

5000

10000NN3_108

20 40 60 80 100 120 1400

5000

10000NN3_109

20 40 60 80 100 120 1400

1

2x 10

4 NN3_110

Plot of 10 time series from the NN3 dataset

Evaluation: Design and implementation

Cross validation aggregation for forecasting Empirical evaluation 12

•Time series data• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and

Nikolopoulos 2011)

         

     

Page 51: Euro 2013 barrow crone - slideshare

Evaluation: Design and implementation

Cross validation aggregation for forecasting Empirical evaluation 12

•Time series data• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and

Nikolopoulos 2011)

• The following experimental setup is used:– Forecast horizon: 12 months– Holdout period: 18 months– Error Measures: SMAPE and MASE.– Rolling origin evaluation (Tashman,2000).

     

Page 52: Euro 2013 barrow crone - slideshare

Evaluation: Design and implementation

Cross validation aggregation for forecasting Empirical evaluation 12

•Time series data• NN3 dataset: 111 time series from the NN3 competition (Crone, Hibon, and

Nikolopoulos 2011)

• The following experimental setup is used:– Forecast horizon: 12 months– Holdout period: 18 months– Error Measures: SMAPE and MASE.– Rolling origin evaluation (Tashman,2000).

• Neural network specification:– A univariate Multiplayer Perceptron (MLP) with Yt up to Yt-13 lags.– Each MLP network contains a single hidden layer; two hidden nodes; and a single output

node with a linear identity function. The hyperbolic tangent transfer function is used.

Page 53: Euro 2013 barrow crone - slideshare

• Across all time series– On validation set Monte carlo cross-validation is always best– All Crogging variants outperform the benchmark Bagging algorithm

and hold-out method (NN model averaging)

Method Train Validation TestBESTMLP 1.25 0.96 1.49HOLDOUT 0.64 0.75 1.20BAG 0.76 0.70 1.21MONTECV 0.76 0.41 1.1610FOLDCV 0.69 0.45 1.072FOLDCV 0.73 0.60 1.15

Method Train Validation TestBESTMLP 12.36 11.10 17.89HOLDOUT 11.78 12.57 16.08BAG 12.95 13.17 16.32MONTECV 13.81 8.29 15.3510FOLDCV 12.65 8.94 15.522FOLDCV 13.68 11.19 15.29

MASE and SMAPE averaged over all time series on training, validation and test dataset across all time series

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 13

MASE SMAPE

Page 54: Euro 2013 barrow crone - slideshare

Boxplots of the MASE and SMAPE averaged over all ftme series for the different methods. The line of reference represents the median value of the distributions.

• Across all time series– On validation set Monte carlo cross-validation is always best– All Crogging variants outperform the benchmark Bagging algorithm

and hold-out method (NN model averaging)

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 13

Page 55: Euro 2013 barrow crone - slideshare

Length Method Forecast Horizon

1-3 4-12 13-18 1-18Long BESTMLP 10.79 16.59 20.02 16.77

HOLDOUT 9.34 14.96 16.20 14.43

BAG 9.74 15.46 16.38 14.81

MONTECV 10.86 15.16 15.43 14.54

10FOLDCV 10.39 14.04 14.82 13.69

2FOLDCV 9.03 14.64 15.69 14.06

SMAPE on test set averaged over long time series for short, medium and long forecast horizon

• Data conditions:– Long time series: 10-fold cross-validation has the smallest error for

medium to long horizons, and over forecast lead times 1-18 

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 14

Page 56: Euro 2013 barrow crone - slideshare

Length Method Forecast Horizon1-3 4-12 13-18 1-18

Short BESTMLP 16.83 17.03 20.66 18.20

HOLDOUT 17.59 17.04 20.12 18.16

BAG 17.20 17.27 20.96 18.49

MONTECV 15.47 14.71 19.05 16.28

10FOLDCV 16.00 15.91 20.25 17.37

2FOLDCV 15.86 14.51 18.95 16.21

SMAPE on test set averaged over short time series for short, medium and long forecast horizon

• Data conditions:– Long time series: 10-fold cross-validation has the smallest error for

medium to long horizons, and over forecast lead times 1-18– Short time series: 2-fold cross validation and Monte-carlo cross-

validation outperform 10-fold cross-validation for all forecast horizons

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 14

Page 57: Euro 2013 barrow crone - slideshare

• Data conditions:– Long time series: 10-fold cross-validation has the smallest error for

medium to long horizons, and over forecast lead times 1-18– Short time series: 2-fold cross validation and Monte-carlo cross-

validation outperform 10-fold cross-validation for all forecast horizons

Boxplots of the SMAPE averaged across long (left) and short (right) time series

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 14

Page 58: Euro 2013 barrow crone - slideshare

   

Average errors Ranking all methods Ranking NN/CI

SMAPE MASE SMAPE MASE SMAPE MASE

B09 Wildi 14.84 1.13 1 2 − −

B07 Theta 14.89 1.13 2 2 − −

C27 Illies 15.18 1.25 3 9 1 7

** 2FOLDCV 15.29 1.15 4 3 2 2

** MONTECV 15.35 1.16 5 4 3 3

B03 ForecastPro 15.44 1.17 6 5 − −

… … … … … … … …** BAG 16.32 1.21 13 8 7 5

… … … … … … … …B00 AutomatANN 16.81 1.21 14 8 8 5

** MLP 17.89 1.50 15 10 9 6

• NN3 Competition:

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 15

Page 59: Euro 2013 barrow crone - slideshare

   

Average errors Ranking all methods Ranking NN/CI

SMAPE MASE SMAPE MASE SMAPE MASE

B09 Wildi 14.84 1.13 1 2 − −

B07 Theta 14.89 1.13 2 2 − −

C27 Illies 15.18 1.25 3 9 1 7

** 2FOLDCV 15.29 1.15 4 3 2 2

** MONTECV 15.35 1.16 5 4 3 3

B03 ForecastPro 15.44 1.17 6 5 − −

… … … … … … … …** BAG 16.32 1.21 13 8 7 5

… … … … … … … …B00 AutomatANN 16.81 1.21 14 8 8 5

** MLP 17.89 1.50 15 10 9 6

• NN3 Competition:

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 15

Page 60: Euro 2013 barrow crone - slideshare

   

Average errors Ranking all methods Ranking NN/CI

SMAPE MASE SMAPE MASE SMAPE MASE

B09 Wildi 14.84 1.13 1 2 − −

B07 Theta 14.89 1.13 2 2 − −

C27 Illies 15.18 1.25 3 9 1 7

** 2FOLDCV 15.29 1.15 4 3 2 2

** MONTECV 15.35 1.16 5 4 3 3

B03 ForecastPro 15.44 1.17 6 5 − −

… … … … … … … …** BAG 16.32 1.21 13 8 7 5

… … … … … … … …B00 AutomatANN 16.81 1.21 14 8 8 5

** MLP 17.89 1.50 15 10 9 6

• NN3 Competition:

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 15

Page 61: Euro 2013 barrow crone - slideshare

   

Average errors Ranking all methods Ranking NN/CI

SMAPE MASE SMAPE MASE SMAPE MASE

B09 Wildi 14.84 1.13 1 2 − −

B07 Theta 14.89 1.13 2 2 − −

C27 Illies 15.18 1.25 3 9 1 7

** 2FOLDCV 15.29 1.15 4 3 2 2

** MONTECV 15.35 1.16 5 4 3 3

B03 ForecastPro 15.44 1.17 6 5 − −

… … … … … … … …** BAG 16.32 1.21 13 8 7 5

… … … … … … … …B00 AutomatANN 16.81 1.21 14 8 8 5

** MLP 17.89 1.50 15 10 9 6

• NN3 Competition:

Evaluation: Findings

Cross validation aggregation for forecasting Empirical evaluation 15

Page 62: Euro 2013 barrow crone - slideshare

1. Motivation2. Cross-validation and model selection3. Cross-validation aggregation4. Empirical evaluation5. Conclusions and future work

Outline

Cross validation aggregation for forecasting Conclusions and future work 16

Page 63: Euro 2013 barrow crone - slideshare

   

 

   

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Page 64: Euro 2013 barrow crone - slideshare

   

 

   

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Page 65: Euro 2013 barrow crone - slideshare

   

 

   

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Not a Forecasting Method!

Page 66: Euro 2013 barrow crone - slideshare

   

 

   

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Not a Forecasting Method!

A general method for improving the accuracy of a

forecast model

Page 67: Euro 2013 barrow crone - slideshare

• Conclusion– Cross-validation aggregation outperforms model selection, Bagging

and the current approaches to model averaging which uses a single hold-out (validation sample)

 

   

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Page 68: Euro 2013 barrow crone - slideshare

• Conclusion– Cross-validation aggregation outperforms model selection, Bagging

and the current approaches to model averaging which uses a single hold-out (validation sample)

– It is especially effective when the amount of data available for training the model is limited as shown for short time series

   

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Page 69: Euro 2013 barrow crone - slideshare

• Conclusion– Cross-validation aggregation outperforms model selection, Bagging

and the current approaches to model averaging which uses a single hold-out (validation sample)

– It is especially effective when the amount of data available for training the model is limited as shown for short time series

– Improvements in forecast accuracy increase with forecast horizons 

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Page 70: Euro 2013 barrow crone - slideshare

• Conclusion– Cross-validation aggregation outperforms model selection, Bagging

and the current approaches to model averaging which uses a single hold-out (validation sample)

– It is especially effective when the amount of data available for training the model is limited as shown for short time series

– Improvements in forecast accuracy increase with forecast horizons– It offers promising results on the NN3 competition

       

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Page 71: Euro 2013 barrow crone - slideshare

• Conclusion– Cross-validation aggregation outperforms model selection, Bagging

and the current approaches to model averaging which uses a single hold-out (validation sample)

– It is especially effective when the amount of data available for training the model is limited as shown for short time series

– Improvements in forecast accuracy increase with forecast horizons– It offers promising results on the NN3 competition

• Future work– Perform bias-variance decomposition and analysis– Consider other base model types other than neural networks– Evaluate forecast accuracy for a larger set of time series - M3

Competition Data (3003 time series, established benchmark)

Cross validation aggregation for forecasting Conclusions and future work 17

Conclusions and future work

Page 72: Euro 2013 barrow crone - slideshare

Devon K. Barrow Lancaster University Management SchoolCentre for ForecastingLancaster, LA1 4YX, UKTel.: +44 (0) 7960271368Email: [email protected]

Questions?