The value of competitive information in forecasting FMCG retail … · 2017-04-07 · FMCG retail product sales and category effects . Outline The research question Literature summary

Professor Robert Fildes [email protected] Dr Tao Huang [email protected] Dr Didier Soopramanien [email protected]

The value of competitive information in forecasting FMCG retail product sales and category effects

Outline The research question Literature summary Our contributions Incorporating competitive information Account for the change of the market environment Data and experimental design Results and insights Conclusion

We forecast retailer product sales (demand) at the product level (e.g. SKU/UPC) Accurate forecasts are important for inventory planning

(e.g. to avoid over-stock and out-of-stock conditions). We want to improve the accuracy!

The Research question

But surely this has been done!

100%

60%

The ‘shape’ of the data series

Many retailers are using simple statistical methods to initially generate

‘baseline forecasts’ and then rely on managers to make adjustments for promotional events. - Cooper et al. (1999): PromoCast® to estimate the adjustments

based on historical information - Fildes et al. (2009): Mechanisms to help managers improving their

adjustments.

Other studies proposed technically sophisticated methods trying to utilizing the price/promotional information of the focal product more effectively. Aburto and Weber (2007): ANN; Ali et al. (2009): Regression tree.

What has been proposed?

We incorporate competitive information

Competitive price and competitive promotions Strong influencing factors on product sales The data are available Previous studies all overlooked the competitive information in

forecasting

We account for the change of the market environment

In reality the effect of price/promotions change over time Ignoring this fact leads to forecast bias

We validate our proposals

How we contribute?

Define competitive information The high dimensionality problem: too many predictors – typically 100- 200 items within each product category, impossible to reduce or even estimate the model if we take them all. Method 1: we apply variable selection method

– Most famous– stepwise regression? Heavily criticized for retaining irrelevant variables and ignoring relevant variables (see Harrell, 2001)

– Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani 1996; Turlach 2000)

– Autometrics (Hendry and Krolzig)

We use a combine of stepwise regression and LASSO (but surely there are alternative algorithms!)

Define competitive information The high dimensionality problem: too many predictors – typically 100- 200 items within each product category, impossible to reduce or even estimate the model Method 2: we apply the Principal Component Analysis (PCA)

– To condense a large number of competitive explanatory variables into a handful set of diffusion indexes (DI)

– Have good performance in forecasting macroeconomic variables (Stock and Watson, 2002)

We incorporate the following competitive information Explanatory variables selected by LASSO/stepwise OR Diffusion indexes constructed by PCA) into Autoregressive Distributed Lag (ADL) models and then simplify the model following the general-to-specific modelling strategy (Hendry 1995)

The econometric model has good interpretability and also proved to be effectively in other areas: Tourism data in Song and Witt (2003); Airline passenger flow data in Fildes, Wei et al. (2009).

Incorporate competitive information

Start with a general model:

Simplify the

An example: The general ADL model

Product Sales (in logs)

For simplicity here we do not show weekly indicators and dummies for calendar events.


Lag of Product Sales



Start with a general model: Lags of own

price/promotions




Lags of competitive price/promotions (selected by LASSO/stepwise); OR

Lags of diffusion indexes (constructed by PCA)


The effect of price and promotions (on product sales) change over time owing to: Economic condition (more price/promotion sensitive during

economic crunch) Consumer tastes change Competitive activities New product entry And the change of any other driving factors which are related to price

and promotions but not included in the model

The change of the market environment

If we compromise the model with constant parameters when in fact the effects of price and promotions are changing over time:

The model will be subject to structural break And be exposed to forecast failure, i.e. forecasts are biased and forecast

error variance also slightly inflated, overall forecasting performance are poor compared to the model’s in-sample fit (Clements and Hendry, 1999)

What happens if we ignore it?

0

2

4

6

8

10

12

14

16

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

103

106

109

112

115

118

Sale

s

Weeks

Actual

An example of how structural break causes forecast bias

Consumers’ demand increase but they also become more price sensitive (in reality, the timing

of the change is UNKNOWN

uxy +−= 210

uxy +−= 314

Simulated data (y – sales, x – price, x~ Unif(0,1), u~ Unif(0,1) )

0

2

4

6

8

10

12

14

16

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

103

106

109

112

115

118

Sale

s

Weeks

Actual

PredictThe deterministic mean of the model with constant

parameters will be a WEIGHTED AVERAGE for the data before and after the structural break

xy 3.24.12 −=

Now we build a model with constant parameters

uxy +−= 314

uxy +−= 210


0

2

4

6

8

10

12

14

16

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

103

106

109

112

115

118

Sale

s

Weeks

Actual

PredictThe deterministic mean of the model with constant

parameters will be a WEIGHTED AVERAGE for the data before and after the structural break

xy 3.24.12 −=

The model obviously under-forecast in the forecast period

uxy +−= 314

uxy +−= 210

Forecast bias


0%

50%

100%98% 100%

90% 100% 95% 100% 100%

96% 92% 83% 88% 91% 88% 92%

Percentage of Models Subject to Structural Break (Chow test, a=0.05)

SpecificationRolling

Test for structural break

ADL models with

LASSO/stepwise OR diffusion factors:

subject to structural

break

Models subject to structural break are exposed to forecast bias.

If we can mitigate this bias, we may improve the forecasting performance.

One way is to allow the parameters to varying over time: E.g. AR(1):

Performance is poor- the presumed function form can hardly explain how the effect of price and promotions change over time.

Offsetting the forecast bias

ttt

ttt

re

uxy

εηββ

β

+=+=++=

−

−

1

1

intint;;int

Offsetting the forecast bias Models subject to structural break are exposed to forecast

bias. If we can mitigate this bias, we may be able to improve the

forecasting performance. Alternatively, we estimate and then offset the forecast bias! Intercept correction

0

2

4

6

8

10

12

14

16

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

103

106

109

112

115

118

Sale

s

Weeks

Actual

Predict

An Example of Intercept Correction Estimate the forecast bias based on the data around the forecast origin. E.g. we take an average of the errors, assuming they are ALL caused by forecast bias…

Forecast bias

Estimate the forecast

bias

0

2

4

6

8

10

12

14

16

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

103

106

109

112

115

118

Sale

s

Weeks

Actual

Predict

Then we offset the bias in the forecast period using the ‘estimated bias’

Offset the forecast

bias

An Example of Intercept Correction

Models subject to structural break are exposed to forecast bias.

Rather than offsetting the forecast bias, we may take a trade-off between the forecast bias and (reduced) forecast error variance by combining the forecasts generated by the models with various lengths of estimation window- Estimated Window Combining (EWC) (Pesaran and Timmermann, 2007)

A trade-off against forecast bias

0

2

4

6

8

10

12

14

16

Sale

s

Weeks

Actual

Predict

An example of combining forecasts

Ideally, we would use the data AFTER the structural break, but the break time

is UNKNOWN

uxy +−= 210

uxy +−= 314


We can only use the data close to the forecast origin- the model may not be subject to structural break, but will have inflated forecast error variance (because of less information used)


0

2

4

6

8

10

12

14

16

Sale

s

Weeks

Actual

Predict

Forecast 1

An example of combining forecasts

0

2

4

6

8

10

12

14

16

Sale

s

Weeks

Actual

Predict

Estimation with full sample data

On the other extreme, we can use ALL the data in the sample, thus we have biased forecasts but the forecast error variance is

smaller (compared to the previous scenario)


Forecast 2

Thus we can take a trade-off between (incurring) forecast bias and (reducing) forecast error variance: we estimate the same model with various estimation windows:

Estimate the model using data [80, 100], generate forecasts as Forecast 1 Estimate the model using data [1, 100], generate forecasts as Forecast 2

Finally we take an average of forecast 1 and forecast 2, the final forecasts may be more accurate (explained by the philosophy of forecast combination)

Combining forecasts

xy β+= int

Dominick’s Finer Foods, a large retail chain in Chicago area in the U.S (available from Chicago University website)

Unit sales, price, and promotions at the UPC level; weekly data Promotions include “Simple price discount” (75%), “Bonus buy” (25%),

and “Coupons” (less than 1%), we use one variable to represent. Aggregate across 83 stores based on All Commodity Volume (the revenue

of the store) 122 items in 6 product categories including Soft Drinks, Frozen-Juices,

Canned Soup, Bath Soap, Front-end-Candies, and Bathroom Tissue. Items are selected with relatively large sales volumes

Data

Fixed window rolling forecast : Estimation period- 120 weeks; forecast 1, 1-4, and1-12 weeks ahead 70 rolling events for each item

Model specification: 200 weeks; Ideally the model could be re-specified every time. This can be

simplified assuming foreknowledge of the data, and the model that would ideally be selected (Fildes et al. 2009)

Error measures: MAPE, symmetric MAPE and MASE, AvgRelMAE

Experiment design

Candidate models of two dimensions: 1) competitive information and 2) offsetting forecast bias

Here we show the symmetric MAPE results for forecast horizon 1-12 weeks ahead, results based on other error measures are similar.

Candidate models and results

Models of 2 dimensions Ignoring the change of market environment

Intercept Correction (IC)

Estimation window

combining (EWC)

No competitive information

ADL-OWN; Base-times-lift ADL-OWN-IC ADL-OWN-EWC

LASSO/stepwise ADL ADL-IC ADL-EWC

Diffusion Factor ADL-DI ADL-DI-IC ADL-DI-EWC

25.6% symmetric MAPE

32.6%

Incorporating competitive information does improve forecasting accuracy.

ADL and ADL-DI both outperform ADL-OWN.



Estimation window

combining (EWC)

No competitive information ADL-OWN ADL-OWN-IC ADL-OWN-EWC



Better performance

+

++

25.6%

23.8%

23.0%


symmetric MAPE



Estimation window

combining (EWC)

No competitive information ADL-OWN ADL-OWN-IC ADL-OWN-EWC



Better performance

++

+

+++ +

++

+

Accounting for the change of the market environment does improve forecasting accuracy.

Models with IC and EWC all outperform their counterparts.

25.6%

23.8%

23.0%

23.9%

23.0%

22.5%

24.1%

23.3%

22.8%


IC and EWC improve the performance of the models with and

without competitive information

symmetric MAPE

We can improve the forecasting accuracy by incorporating competitive information:

PCA and LASSO/stepwise

Results and insights

ADL-DI versus ADL-OWN 1 -12 wks ahead

1 -4 wks ahead

1 week ahead

Promoted -6.1% -4.6% -3.5% Non-promoted -14.6% -12.0% -8.6%

ADL versus ADL-OWN 1 -12 wks ahead

1 -4 wks ahead

1 week ahead

Promoted -3.3% -0.2% 1.2% Non-promoted -10.2% -7.0% -4.6%

ADL and ADL-DI substantially outperform ADL-OWN when the focal product is not being promoted. A possible reason is retailers try to avoid promoting

competing products at the same time, so if the focal product is being promoted, their tend to be less promotional information on other competitive items

We can improve the forecasting accuracy by offsetting potential forecast bias using

Intercept Correction (IC) and Estimation Window Combining (EWC)


ADL-OWN-IC versus ADL-OWN 1 -12 wks ahead

1 -4 wks ahead

1 week ahead

Promoted -2.0% -0.8% 0.7% Non-promoted -9.6% -10.2% -9.7% ADL-OWN-EWC versus ADL-OWN

1 -12 wks ahead

1 -4 wks ahead

1 week ahead

Promoted -0.3% 0.0% -0.6% Non-promoted -8.3% -8.0% -7.1%

In the absence of competitive information, by offsetting the potential forecast bias of the ADL-OWN model, we achieve substantially higher forecasting

accuracy, mainly for the forecast period when the focal product is not being promoted.




ADL-EWC versus ADL 1 -12 wks ahead

1 -4 wks ahead

1 week ahead

Promoted 0.8% 0.6% -0.9% Non-promoted -3.2% -4.2% -4.4%

ADL-IC versus ADL 1 -12 wks ahead

1 -4 wks ahead

1 week ahead


WITH competitive information, by offsetting the potential forecast bias of the ADL model, we achieve substantially higher forecasting accuracy, mainly for

the forecast period when the focal product is not being promoted.




ADL-DI-EWC versus ADL-DI 1 -12 wks ahead

1 -4 wks ahead

1 week ahead


ADL-DI-IC versus ADL-DI 1 -12 wks ahead

1 -4 wks ahead

1 week ahead


WITH competitive information, by offsetting the potential forecast bias of the ADL-DI model, we achieve substantially higher forecasting accuracy, mainly

for the forecast period when the focal product is not being promoted.

We can improve the forecasting accuracy by Incorporating competitive information PCA and LASSO/stepwise Accounting for the change of the market environment. Intercept Correction (IC) and Estimation Window Combining (EWC) The advantage of the new models mainly come from the forecast period

when the focal product is not on promotion The best model is the ADL model with diffusion indexes and intercept

correction (i.e. ADL-DI-IC)

Summary

Thank you! Questions?

Professor Robert Fildes [email protected] Dr Tao Huang [email protected] Dr Didier Soopramanien [email protected]

Documents

The value of competitive information in forecasting FMCG retail … · 2017-04-07 · FMCG retail product sales and category effects . Outline The research question Literature summary