Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Professor Robert Fildes [email protected] Dr Tao Huang [email protected] Dr Didier Soopramanien [email protected]
The value of competitive information in forecasting FMCG retail product sales and category effects
Outline The research question Literature summary Our contributions Incorporating competitive information Account for the change of the market environment Data and experimental design Results and insights Conclusion
We forecast retailer product sales (demand) at the product level (e.g. SKU/UPC) Accurate forecasts are important for inventory planning
(e.g. to avoid over-stock and out-of-stock conditions). We want to improve the accuracy!
The Research question
But surely this has been done!
100%
60%
The ‘shape’ of the data series
Many retailers are using simple statistical methods to initially generate
‘baseline forecasts’ and then rely on managers to make adjustments for promotional events. - Cooper et al. (1999): PromoCast® to estimate the adjustments
based on historical information - Fildes et al. (2009): Mechanisms to help managers improving their
adjustments.
Other studies proposed technically sophisticated methods trying to utilizing the price/promotional information of the focal product more effectively. Aburto and Weber (2007): ANN; Ali et al. (2009): Regression tree.
What has been proposed?
We incorporate competitive information
Competitive price and competitive promotions Strong influencing factors on product sales The data are available Previous studies all overlooked the competitive information in
forecasting
We account for the change of the market environment
In reality the effect of price/promotions change over time Ignoring this fact leads to forecast bias
We validate our proposals
How we contribute?
Define competitive information The high dimensionality problem: too many predictors – typically 100- 200 items within each product category, impossible to reduce or even estimate the model if we take them all. Method 1: we apply variable selection method
– Most famous– stepwise regression? Heavily criticized for retaining irrelevant variables and ignoring relevant variables (see Harrell, 2001)
– Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996; Turlach 2000)
– Autometrics (Hendry and Krolzig)
We use a combine of stepwise regression and LASSO (but surely there are alternative algorithms!)
Define competitive information The high dimensionality problem: too many predictors – typically 100- 200 items within each product category, impossible to reduce or even estimate the model Method 2: we apply the Principal Component Analysis (PCA)
– To condense a large number of competitive explanatory variables into a handful set of diffusion indexes (DI)
– Have good performance in forecasting macroeconomic variables (Stock and Watson, 2002)
We incorporate the following competitive information Explanatory variables selected by LASSO/stepwise OR Diffusion indexes constructed by PCA) into Autoregressive Distributed Lag (ADL) models and then simplify the model following the general-to-specific modelling strategy (Hendry 1995)
The econometric model has good interpretability and also proved to be effectively in other areas: Tourism data in Song and Witt (2003); Airline passenger flow data in Fildes, Wei et al. (2009).
Incorporate competitive information
Start with a general model:
Simplify the
An example: The general ADL model
Product Sales (in logs)
For simplicity here we do not show weekly indicators and dummies for calendar events.
Start with a general model:
Lag of Product Sales
For simplicity here we do not show weekly indicators and dummies for calendar events.
An example: The general ADL model
Start with a general model: Lags of own
price/promotions
For simplicity here we do not show weekly indicators and dummies for calendar events.
An example: The general ADL model
Start with a general model:
Lags of competitive price/promotions (selected by LASSO/stepwise); OR
Lags of diffusion indexes (constructed by PCA)
An example: The general ADL model
The effect of price and promotions (on product sales) change over time owing to: Economic condition (more price/promotion sensitive during
economic crunch) Consumer tastes change Competitive activities New product entry And the change of any other driving factors which are related to price
and promotions but not included in the model
The change of the market environment
If we compromise the model with constant parameters when in fact the effects of price and promotions are changing over time:
The model will be subject to structural break And be exposed to forecast failure, i.e. forecasts are biased and forecast
error variance also slightly inflated, overall forecasting performance are poor compared to the model’s in-sample fit (Clements and Hendry, 1999)
What happens if we ignore it?
0
2
4
6
8
10
12
14
16
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
103
106
109
112
115
118
Sale
s
Weeks
Actual
An example of how structural break causes forecast bias
Consumers’ demand increase but they also become more price sensitive (in reality, the timing
of the change is UNKNOWN
uxy +−= 210
uxy +−= 314
Simulated data (y – sales, x – price, x~ Unif(0,1), u~ Unif(0,1) )
0
2
4
6
8
10
12
14
16
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
103
106
109
112
115
118
Sale
s
Weeks
Actual
PredictThe deterministic mean of the model with constant
parameters will be a WEIGHTED AVERAGE for the data before and after the structural break
xy 3.24.12 −=
Now we build a model with constant parameters
uxy +−= 314
uxy +−= 210
An example of how structural break causes forecast bias
0
2
4
6
8
10
12
14
16
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
103
106
109
112
115
118
Sale
s
Weeks
Actual
PredictThe deterministic mean of the model with constant
parameters will be a WEIGHTED AVERAGE for the data before and after the structural break
xy 3.24.12 −=
The model obviously under-forecast in the forecast period
uxy +−= 314
uxy +−= 210
Forecast bias
An example of how structural break causes forecast bias
0%
50%
100%98% 100%
90% 100% 95% 100% 100%
96% 92% 83% 88% 91% 88% 92%
Percentage of Models Subject to Structural Break (Chow test, a=0.05)
SpecificationRolling
Test for structural break
ADL models with
LASSO/stepwise OR diffusion factors:
subject to structural
break
Models subject to structural break are exposed to forecast bias.
If we can mitigate this bias, we may improve the forecasting performance.
One way is to allow the parameters to varying over time: E.g. AR(1):
Performance is poor- the presumed function form can hardly explain how the effect of price and promotions change over time.
Offsetting the forecast bias
ttt
ttt
re
uxy
εηββ
β
+=+=++=
−
−
1
1
intint;;int
Offsetting the forecast bias Models subject to structural break are exposed to forecast
bias. If we can mitigate this bias, we may be able to improve the
forecasting performance. Alternatively, we estimate and then offset the forecast bias! Intercept correction
0
2
4
6
8
10
12
14
16
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
103
106
109
112
115
118
Sale
s
Weeks
Actual
Predict
An Example of Intercept Correction Estimate the forecast bias based on the data around the forecast origin. E.g. we take an average of the errors, assuming they are ALL caused by forecast bias…
Forecast bias
Estimate the forecast
bias
0
2
4
6
8
10
12
14
16
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100
103
106
109
112
115
118
Sale
s
Weeks
Actual
Predict
Then we offset the bias in the forecast period using the ‘estimated bias’
Offset the forecast
bias
An Example of Intercept Correction
Models subject to structural break are exposed to forecast bias.
Rather than offsetting the forecast bias, we may take a trade-off between the forecast bias and (reduced) forecast error variance by combining the forecasts generated by the models with various lengths of estimation window- Estimated Window Combining (EWC) (Pesaran and Timmermann, 2007)
A trade-off against forecast bias
0
2
4
6
8
10
12
14
16
Sale
s
Weeks
Actual
Predict
An example of combining forecasts
Ideally, we would use the data AFTER the structural break, but the break time
is UNKNOWN
uxy +−= 210
uxy +−= 314
Simulated data (y – sales, x – price, x~ Unif(0,1), u~ Unif(0,1) )
We can only use the data close to the forecast origin- the model may not be subject to structural break, but will have inflated forecast error variance (because of less information used)
Simulated data (y – sales, x – price, x~ Unif(0,1), u~ Unif(0,1) )
0
2
4
6
8
10
12
14
16
Sale
s
Weeks
Actual
Predict
Forecast 1
An example of combining forecasts
0
2
4
6
8
10
12
14
16
Sale
s
Weeks
Actual
Predict
Estimation with full sample data
On the other extreme, we can use ALL the data in the sample, thus we have biased forecasts but the forecast error variance is
smaller (compared to the previous scenario)
Simulated data (y – sales, x – price, x~ Unif(0,1), u~ Unif(0,1) )
Forecast 2
Thus we can take a trade-off between (incurring) forecast bias and (reducing) forecast error variance: we estimate the same model with various estimation windows:
Estimate the model using data [80, 100], generate forecasts as Forecast 1 Estimate the model using data [1, 100], generate forecasts as Forecast 2
Finally we take an average of forecast 1 and forecast 2, the final forecasts may be more accurate (explained by the philosophy of forecast combination)
Combining forecasts
xy β+= int
Dominick’s Finer Foods, a large retail chain in Chicago area in the U.S (available from Chicago University website)
Unit sales, price, and promotions at the UPC level; weekly data Promotions include “Simple price discount” (75%), “Bonus buy” (25%),
and “Coupons” (less than 1%), we use one variable to represent. Aggregate across 83 stores based on All Commodity Volume (the revenue
of the store) 122 items in 6 product categories including Soft Drinks, Frozen-Juices,
Canned Soup, Bath Soap, Front-end-Candies, and Bathroom Tissue. Items are selected with relatively large sales volumes
Data
Fixed window rolling forecast : Estimation period- 120 weeks; forecast 1, 1-4, and1-12 weeks ahead 70 rolling events for each item
Model specification: 200 weeks; Ideally the model could be re-specified every time. This can be
simplified assuming foreknowledge of the data, and the model that would ideally be selected (Fildes et al. 2009)
Error measures: MAPE, symmetric MAPE and MASE, AvgRelMAE
Experiment design
Candidate models of two dimensions: 1) competitive information and 2) offsetting forecast bias
Here we show the symmetric MAPE results for forecast horizon 1-12 weeks ahead, results based on other error measures are similar.
Candidate models and results
Models of 2 dimensions Ignoring the change of market environment
Intercept Correction (IC)
Estimation window
combining (EWC)
No competitive information
ADL-OWN; Base-times-lift ADL-OWN-IC ADL-OWN-EWC
LASSO/stepwise ADL ADL-IC ADL-EWC
Diffusion Factor ADL-DI ADL-DI-IC ADL-DI-EWC
25.6% symmetric MAPE
32.6%
Incorporating competitive information does improve forecasting accuracy.
ADL and ADL-DI both outperform ADL-OWN.
Models of 2 dimensions Ignoring the change of market environment
Intercept Correction (IC)
Estimation window
combining (EWC)
No competitive information ADL-OWN ADL-OWN-IC ADL-OWN-EWC
LASSO/stepwise ADL ADL-IC ADL-EWC
Diffusion Factor ADL-DI ADL-DI-IC ADL-DI-EWC
Better performance
+
++
25.6%
23.8%
23.0%
Candidate models and results
symmetric MAPE
Models of 2 dimensions Ignoring the change of market environment
Intercept Correction (IC)
Estimation window
combining (EWC)
No competitive information ADL-OWN ADL-OWN-IC ADL-OWN-EWC
LASSO/stepwise ADL ADL-IC ADL-EWC
Diffusion Factor ADL-DI ADL-DI-IC ADL-DI-EWC
Better performance
++
+
+++ +
++
+
Accounting for the change of the market environment does improve forecasting accuracy.
Models with IC and EWC all outperform their counterparts.
25.6%
23.8%
23.0%
23.9%
23.0%
22.5%
24.1%
23.3%
22.8%
Candidate models and results
IC and EWC improve the performance of the models with and
without competitive information
symmetric MAPE
We can improve the forecasting accuracy by incorporating competitive information:
PCA and LASSO/stepwise
Results and insights
ADL-DI versus ADL-OWN 1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted -6.1% -4.6% -3.5% Non-promoted -14.6% -12.0% -8.6%
ADL versus ADL-OWN 1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted -3.3% -0.2% 1.2% Non-promoted -10.2% -7.0% -4.6%
ADL and ADL-DI substantially outperform ADL-OWN when the focal product is not being promoted. A possible reason is retailers try to avoid promoting
competing products at the same time, so if the focal product is being promoted, their tend to be less promotional information on other competitive items
We can improve the forecasting accuracy by offsetting potential forecast bias using
Intercept Correction (IC) and Estimation Window Combining (EWC)
Results and insights
ADL-OWN-IC versus ADL-OWN 1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted -2.0% -0.8% 0.7% Non-promoted -9.6% -10.2% -9.7% ADL-OWN-EWC versus ADL-OWN
1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted -0.3% 0.0% -0.6% Non-promoted -8.3% -8.0% -7.1%
In the absence of competitive information, by offsetting the potential forecast bias of the ADL-OWN model, we achieve substantially higher forecasting
accuracy, mainly for the forecast period when the focal product is not being promoted.
We can improve the forecasting accuracy by offsetting potential forecast bias using
Intercept Correction (IC) and Estimation Window Combining (EWC)
Results and insights
ADL-EWC versus ADL 1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted 0.8% 0.6% -0.9% Non-promoted -3.2% -4.2% -4.4%
ADL-IC versus ADL 1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted -1.1% -0.9% -0.7% Non-promoted -4.6% -5.1% -4.6%
WITH competitive information, by offsetting the potential forecast bias of the ADL model, we achieve substantially higher forecasting accuracy, mainly for
the forecast period when the focal product is not being promoted.
We can improve the forecasting accuracy by offsetting potential forecast bias using
Intercept Correction (IC) and Estimation Window Combining (EWC)
Results and insights
ADL-DI-EWC versus ADL-DI 1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted -1.7% -2.8% -4.0% Non-promoted -6.2% -7.3% -6.8%
ADL-DI-IC versus ADL-DI 1 -12 wks ahead
1 -4 wks ahead
1 week ahead
Promoted -3.7% -5.0% -4.9% Non-promoted -7.8% -8.3% -6.9%
WITH competitive information, by offsetting the potential forecast bias of the ADL-DI model, we achieve substantially higher forecasting accuracy, mainly
for the forecast period when the focal product is not being promoted.
We can improve the forecasting accuracy by Incorporating competitive information PCA and LASSO/stepwise Accounting for the change of the market environment. Intercept Correction (IC) and Estimation Window Combining (EWC) The advantage of the new models mainly come from the forecast period
when the focal product is not on promotion The best model is the ADL model with diffusion indexes and intercept
correction (i.e. ADL-DI-IC)
Summary
Thank you! Questions?
Professor Robert Fildes [email protected] Dr Tao Huang [email protected] Dr Didier Soopramanien [email protected]