10
A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217, 7500AE Enschede The Netherlands [email protected] ABSTRACT Stockouts have a huge impact on retailers and customer satisfaction. In recent years, machine learning has proved useful for accurate time series sales forecasting. And accu- rate forecasts helps retailers make accurate purchasing de- cisions. But a lot of retailers still rely on an old-fashioned way of forecasting. In this paper a model is proposed that aims to tackle this problem and provide retailers world- wide with a methodology to make time series sales fore- casting using machine learning accessible. The proposed model is created and verified based on a case study at a large retailer. All steps needed for a retailer to gain valu- able business understanding and make decisions that help to prevent stockouts are discussed, from data exploration, processing to the actual forecasting techniques. The re- search shows that every article needs to be handled indi- vidually to get the optimal results. While the implemented forecasting technique in the case study is more successful in maintaining a healthy inventory level, it is not able to prevent stockouts in all cases. The model could easily be extended with more advanced techniques, making it a valuable tool for retailers worldwide. Keywords Stockouts, Retail, Machine Learning, Sales Forecasting, Time Series, ARIMA 1. INTRODUCTION The goal of retailers worldwide is to provide the right prod- ucts at the right time while maintaining a healthy stock. Retailers need to have a good overview of how many items they have in stock and when they need to re-stock their merchandise. Despite modern sales forecasting techniques using machine learning, retailers still use a very traditional way of resupplying their stores [9]. To prevent stockouts, retailers often have large safety stocks [19]. But despite the safety stocks, the chance of a stockout is huge, with potentially a big impact on customer satisfaction and sales [7, 11]. To prevent stockouts and increase profitability, re- tailers need to have an accurate prediction of customer demand [19]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy oth- erwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 27 th Twente Student Conference on IT, July 7, 2017, Enschede, The Netherlands. Copyright 2017, University of Twente, Faculty of Electrical Engineer- ing, Mathematics and Computer Science. In recent years, many machine learning techniques have been used successfully for sales forecasting in retail. But because of the diversity of the retail sector, it is difficult for retailers to pick the optimal technique depending on the article and business domain. This could potentially lead to less accurate forecasts and ultimately to a stockout. This research aims to tackle this problem and provide re- tailers worldwide with a methodology to make sales fore- casting using machine learning accessible and effective. All steps needed for a retailer to gain valuable business under- standing and make decisions that help to prevent stockouts will be discussed, from data exploration, processing to the actual forecasting technique(s). The goal of this research is therefore to find out if and to what extent we can help in preventing stockouts in re- tail. The results will form the basis of a new model that can help retailers from different business domains make adequate decisions on when and which machine learning technique to use to tackle their inventory or sales problem. The proposed model will be based and verified on a case study at a large retailer. Based on the aforementioned problems, this paper ad- dresses the following research question: How and to what extent can retailers prevent stockouts us- ing sales forecasting? The research question consists of sub-questions, based on the case study: RQ1 What important details can be identified from the data set? RQ2 Which methods are applicable for processing the data to be able to make a sales forecast? RQ3 What machine learning techniques are applicable for sales forecasting and to what extent? RQ4 To what extent can these forecasting techniques help to prevent stockouts? RQ5 How can other retailers use the proposed tech- niques? In section 2 the related work is discussed to give an insight of the available literature on stockouts and ways to pre- vent stockouts in the future. The methodology to tackle the research questions is discussed in section 3. The data exploration and processing can be found in sections 4 and 5. The results of the forecasting technique and the pro- posed model can be found in section 6. The research is then concluded in section 7 and its limitations and future work are assessed in sections 8 and 9. 1

A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

  • Upload
    others

  • View
    6

  • Download
    6

Embed Size (px)

Citation preview

Page 1: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

A Model to Prevent Stockouts in Retail using Time SeriesSales Forecasting

M.W.T. GemminkUniversity of Twente

P.O. Box 217, 7500AE EnschedeThe Netherlands

[email protected]

ABSTRACTStockouts have a huge impact on retailers and customersatisfaction. In recent years, machine learning has proveduseful for accurate time series sales forecasting. And accu-rate forecasts helps retailers make accurate purchasing de-cisions. But a lot of retailers still rely on an old-fashionedway of forecasting. In this paper a model is proposed thataims to tackle this problem and provide retailers world-wide with a methodology to make time series sales fore-casting using machine learning accessible. The proposedmodel is created and verified based on a case study at alarge retailer. All steps needed for a retailer to gain valu-able business understanding and make decisions that helpto prevent stockouts are discussed, from data exploration,processing to the actual forecasting techniques. The re-search shows that every article needs to be handled indi-vidually to get the optimal results. While the implementedforecasting technique in the case study is more successfulin maintaining a healthy inventory level, it is not able toprevent stockouts in all cases. The model could easilybe extended with more advanced techniques, making it avaluable tool for retailers worldwide.

KeywordsStockouts, Retail, Machine Learning, Sales Forecasting,Time Series, ARIMA

1. INTRODUCTIONThe goal of retailers worldwide is to provide the right prod-ucts at the right time while maintaining a healthy stock.Retailers need to have a good overview of how many itemsthey have in stock and when they need to re-stock theirmerchandise. Despite modern sales forecasting techniquesusing machine learning, retailers still use a very traditionalway of resupplying their stores [9]. To prevent stockouts,retailers often have large safety stocks [19]. But despitethe safety stocks, the chance of a stockout is huge, withpotentially a big impact on customer satisfaction and sales[7, 11]. To prevent stockouts and increase profitability, re-tailers need to have an accurate prediction of customerdemand [19].

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. To copy oth-erwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee.27th Twente Student Conference on IT, July 7, 2017, Enschede, TheNetherlands.Copyright 2017, University of Twente, Faculty of Electrical Engineer-ing, Mathematics and Computer Science.

In recent years, many machine learning techniques havebeen used successfully for sales forecasting in retail. Butbecause of the diversity of the retail sector, it is difficult forretailers to pick the optimal technique depending on thearticle and business domain. This could potentially leadto less accurate forecasts and ultimately to a stockout.

This research aims to tackle this problem and provide re-tailers worldwide with a methodology to make sales fore-casting using machine learning accessible and effective. Allsteps needed for a retailer to gain valuable business under-standing and make decisions that help to prevent stockoutswill be discussed, from data exploration, processing to theactual forecasting technique(s).

The goal of this research is therefore to find out if andto what extent we can help in preventing stockouts in re-tail. The results will form the basis of a new model thatcan help retailers from different business domains makeadequate decisions on when and which machine learningtechnique to use to tackle their inventory or sales problem.The proposed model will be based and verified on a casestudy at a large retailer.

Based on the aforementioned problems, this paper ad-dresses the following research question:

How and to what extent can retailers prevent stockouts us-ing sales forecasting?

The research question consists of sub-questions, based onthe case study:

RQ1 What important details can be identified fromthe data set?

RQ2 Which methods are applicable for processing thedata to be able to make a sales forecast?

RQ3 What machine learning techniques are applicablefor sales forecasting and to what extent?

RQ4 To what extent can these forecasting techniqueshelp to prevent stockouts?

RQ5 How can other retailers use the proposed tech-niques?

In section 2 the related work is discussed to give an insightof the available literature on stockouts and ways to pre-vent stockouts in the future. The methodology to tacklethe research questions is discussed in section 3. The dataexploration and processing can be found in sections 4 and5. The results of the forecasting technique and the pro-posed model can be found in section 6. The research isthen concluded in section 7 and its limitations and futurework are assessed in sections 8 and 9.

1

Page 2: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

2. RELATED WORKA lot of literature has been published regarding sales fore-casting, the causes and the impact of stockouts and op-portunities for retailers when a stockout occurs.

Sales forecasting has always been a popular subject in lit-erature, which makes sense because of the importance forretail to maintain a healthy inventory. Xia et al. (2012)identified two groups of forecasting methods, the classi-cal methods based on mathematical and statistical mod-els, and modern methods that use artificial intelligencelike machine learning [19]. The classical methods are notalways satisfactory when it comes to complex real-worldproblems. They proposed an extreme learning algorithm(ELM) with adaptive metrics of inputs to improve fore-casting accuracy. The method outperformed other mod-ern methods like auto-regression. Wong et al. (2010) alsoused ELM for medium-term fashion sales forecasting butintegrated a harmony search algorithm to improve the net-work generalization performance [18].

Lastly, Chen et al. (2011) used a sales forecasting sys-tem based on Gray Extreme Learning Machine (GELM)[5]. The experimental results showed that GELM outper-formed several sales forecasting methods which are basedon back-propagation neural networks.

Chu et al. (2003) compared different linear and nonlinearmodels for retail sales forecasting and found that nonlin-ear models are able to outperform their linear counterparts[6]. The overall best model in their research was the neu-ral network built on deseasonalized time series data. Atime series is a series of data points indexed (or listed orgraphed) in time order. Most commonly, a time series is asequence taken at successive equally spaced points in time[2].

A model that is often used in sales forecasting is the Bassmodel, which describes how products get adopted in thepopulation, often used for new products. Lee et al. (2014)proposed a machine learning based approach to find thetwo most difficult and important variables of the Bassmodel [13]. Fan et al. (2017) also used the Bass model,but in a different context. They combined the Bass modelwith a sentiment analysis based on online reviews and his-torical sales data and found that it has a higher accuracythan the standard Bass model [10].

The causes of these stockouts also have been examined.Ehrenthal et al. (2013) indicate that the causes of retailstockouts are specific to the retailer, store, category andproduct [9], there is no solution that works everywhere.This partially explains why stockouts still occur today.

The impact of a stockout is more costly than most compa-nies imagine, Corsten et al. (2004) found that in a studywith more than 600 retail outlets [7]. Campo et al. (2004)researched what consumers buy in a category during a out-of-stock period and found that these periods can affect allthree purchase decisions: reducing the probability of pur-chase incidence and lead to purchase of smaller quantities.One year later these researchers discussed the differencesand similarities of stockouts and permanent assortmentreductions [3]. Sloot et al. (2005) identified what effectbrand equity and the hedonic level of products have onthe decision making of consumers in case of a stockout[15]. Intended behavior during a stockout can howeverdiffer from the actual behavior, Zinn et al. (2008) foundthat the intended behavior is a good indicator when thecustomer quits the search but rather poor when customersdelay the search [20].

Ehrental et al. (2013) proposed a procedure to help storemanagers in reducing the change of a stockout to belowthe global average of 8.3 percent [9]. Peinkofer et al.(2016) investigated the impact of limited inventory avail-ability disclosure on customer responses, but in contrastto their hypothesis that it would increase customer com-petition and therefore decrease the negative impact, theresults showed that customers were actually more dissat-isfied. Improvements to store operations, coordination ofstore delivery and shelf replenishments are described askey factors in reducing stockouts [14]. Breugelmans et al.(2006) investigated three scenarios in a stockout periodand found that suggesting a replacement slightly reducesthe purchase cancellation rate. This effect however dis-appears when replacement products are significantly moreexpensive [1].

3. METHOD OF RESEARCHThis research is based on a case study at a large fashionretailer with more than a 100 stores. The retailer selectedso-called ‘continuity articles’, which means that they maynever be out of stock. In order to forecast sales and preventstockouts, one first needs to explore the data. The aim ofthe data exploration is to discover important details andcorrelations that might be of interest later on, this is doneby visualizing the data. The data will be aggregated tofixed periods, so the forecasting can be treated as a timeseries problem.

To make the data set ready for forecasting, one needsto process the data. A data warehouse will be createdthat contains all the important sales facts, such as therespective article and store. With the data warehouse inplace, one is more flexible to experiment with different ap-proaches. Many time series forecasting models rely on themodel to be stationary, so multiple techniques and trans-formations will be tested to see which one fits a particulararticle and yields the best results.

When the data is processed and ready to be used for fore-casting sales, a forecasting method will be used to see towhat extent it can accurately forecast sales. The resultswill be compared to other forecasting techniques such asthe ELM proposed by Xia et al. (2012) [19]. Validation ofthe results is a crucial part in this research. By dividingthe available data in a training and testing set, the accu-racy of the forecasting techniques can be identified. Thiswill be done using the root mean squared error (RMSE),because it gives retailers a logical impression of the perfor-mance if compared to other forecasting techniques. TheRMSE will also be combined with the average sales to giveretailers an idea of the actual accuracy of the technique.

Based on the results of the previous steps, a model willbe constructed that can help retailers from all branchesto implement and use time series forecasting techniques.The model should be self explanatory. The results will bediscussed and also how the model can be implemented andextended. To be able to answer the research question, themodel and sales forecasting technique are validated. Thisis done by checking whether the sales forecasting model isable to lower the chance of a stockout, by combining andcomparing the sales forecasting technique with the actualstock level data.

The final step is to conclude the research and to validate towhat extent other retailers could benefit from the proposedmodel. But also to identify its limitations and what futurework is needed to make the model more applicable forretailers worldwide.

2

Page 3: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

Table 1. Articles used in this data set.Article Target Group TypeArticle 1 men pantsArticle 2 men t-shirtsArticle 3 men t-shirtsArticle 4 men denimArticle 5 men denimArticle 6 women denimArticle 7 boys t-shirtsArticle 8 boys denimArticle 9 girls denimArticle 10 girls denim

4. DATA EXPLORATIONThe first step in data analysis is to explore and visualizethe data, gaining valuable insights.

The data set consists of weekly sales and stock data froma large fashion retailer operating in The Netherlands, Bel-gium and Luxembourg. The time period of the data rangesfrom July 2016 till May 2017. Ten articles have been se-lected that are labeled as ‘continuity articles’, which meansthey should always be in stock, making those the most in-teresting for this case study. The details of the ‘continuityarticles’ can be found in Table 1.

For the remainder of this section, three articles have beenselected to increase the readability.

Figure 1. Pieces sold for three articles in 2016.

Figure 2. Discount percentage of revenue for threearticles in 2016.

As can be seen in Figure 1, the pieces sold are very differentper article. Whereas article 9 and 10 have a minor positivetrend, article 3 is sold at a steady rate. The graph alsoshows that the seasonality for article 10 is very high. Thepeaks in October and November can partially be explainedby the fact that those articles where sold with a discountin that period, as shown in Figure 2. However, a discountdoes not necessarily imply that article sales will increaseduring a sale period, a high discount on article 3 did notsignificantly increase the sales.

These insights indicate that not all articles can be treatedequally, because of their huge differences in trend, season-ality and discount dependence.

5. DATA PROCESSINGThe next step in the analysis is the processing of the data.In this section the data processing is discussed, starting insection 5.1 with an initial processing step namely creatinga data warehouse and in section 5.2 stationarity techniquesare discussed.

5.1 Data WarehouseA data warehouse was created containing all the facts.The star schema can be found in Figure 3.

Figure 3. Data warehouse for sales data.

5.2 StationarityA time series is said to be stationary if its statistical prop-erties such as mean and variance remain constant overtime. This is important because most time series modelswork on the assumption that the data is stationary. Let{Xt} be a stochastic process and let FX(xt1+τ , . . . , xtk+τ )represent the cumulative distribution function of the jointdistribution of {Xt} at times t1 + τ . Then {Xt} is said tobe stationary if, for all k, for all τ , and for all t1, . . . , tk,FX(xt1+τ , . . . , xtk+τ ) = FX(xt1 , . . . , xtk ). Since τ doesnot affect FX(•), FX is not a function of time [2].

To test the stationarity of a time series the AugmentedDickey-Fuller (ADF) test is used [8].

The ADF test for a unit root assesses the null hypothesesof a unit root using the model:

yt = c+ γt+ ϕyt−1 + β1∆yt−1 + . . .+ βp∆yt−p + εt

Where:

• ∆ is the differencing operator, such that ∆yt = yt−1.

• The number of lagged difference terms, p, is userspecified.

• εt is a mean zero innovation process.

3

Page 4: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

Figure 4. Two figures that illustrate the rollingmean and standard deviation of the sales of article7, both originally and after differencing.

The null hypothesis of a unit root is: H0 : ϕ = 1.

Under the alternative hypothesis, ϕ < 1. Variants ofthe model allow for different growth characteristics. Themodel with ϕ = 0 has no trend component, and the modelwith c = 0 and ϕ = 0 has no drift or trend. A test that failsto reject the null hypothesis, fails to reject the possibilityof a unit root.

A lot of products in the data set are far from stationary.That is why in this paper multiple techniques are imple-mented which aim to transform the original time series toa more stationary version. The underlying principle is tomodel or estimate the trend and seasonality in the seriesand remove those to get a stationary series [2].

Transformations are used first to reduce trend in a sim-ple but sometimes effective way, the log of a time seriesfor example can be useful for series with a positive trend,because it penalizes higher values more than lower values.The following transformations are used: none, the log ofthe series or the square root of the series.

The transformed series with the highest confidence of astationary series (tested with the ADF test) will be usedin the techniques that are mentioned next.

A common approach is the Moving Average (MA). Thisapproach takes the moving average based on the frequencyof the time series. Note that if this approach takes theaverage of the last n values, then the rolling mean will notbe defined for the first n−1 values. This technique can beused in combination with the log transform. A drawbackof the aforementioned technique is that the time periodhas to be strictly defined. This makes the technique lessuseful in complex situations [2].

An alternative for the MA is the Exponentially WeightedMoving Average (EWMA), where a weight is assigned toall previous values with a decay factor. The difficulty inthis technique is determining the halflife or amount of ex-ponential decay. This depends strongly on the businessdomain [2].

Differencing is a more advanced technique which worksbetter in cases with high seasonality. Differencing is takingthe difference of an observation with that of the previous

one. An example of how differencing was used successfullyin the case study can be found in Figure 4 where the rollingmean became much more stable therefore increasing thestationarity of the series. A window of 10 was used, thatexplains why the first 9 weeks have no rolling mean andstandard deviation [2].

What is interesting about the data set when using thesetrend and seasonality eliminating techniques is that everyarticle requires a different technique and transformation.As can be seen for three articles in Table 2. The results forall articles can be found in Appendix A. The techniquesare very successful in making the series stationary, for allarticles, the stationary confidence levels can be increasedto at least 98%. To illustrate the importance of this adap-tive approach, Table 3 contains the confidence levels if theworst technique were used. While the confidence levels arestill high, the confidence of article 4 declined.

Table 2. Transformations (TR) and trend/seasontechniques (TS) used to get the highest level ofconfidence the series is stationary.

Article ADF TR TS ADF endArticle 1 70.59% log MA 99.96%Article 6 86.23% log EWMA 100.00%Article 7 51.70% as is MA 100.00%

Table 3. Techniques and transformations used thatreturn the worst result.

Article ADF TR TS ADF endArticle 1 70.59% as is EWMA 95.19%Article 4 99.87% log Differencing 98.87%Article 7 51.70% log EWMA 99.96%

6. RESULTSIn this section the forecasting results are discussed in sec-tion 6.1. The forecasting results combined with the resultsfrom the previous sections are combined into the proposedmodel discussed in section 6.2.

6.1 Sales ForecastingHaving the techniques and transformations in place tomake any time series stationary, one can start using mod-els to forecast the sales for a particular article. In this pa-per, the Autoregressive Integrated Moving Average model(ARIMA) is used. ARIMA is generally the model used toforecast times series and other more complex models areoften compared with the results of the ARIMA model [2].For validation purposes the training data consists of thedata from 2016 and the test data will be the available datain 2017.

The ARIMA model is a popular technique used for timeseries forecasting. The model can be defined as:

y′t = c+φ1y′t−1 + . . .+φpy

′t−p + θ1et−1 + . . .+ θqet−q + et

Where y′t is the differences series. The predictors on theright hand side include both lagged values of yt and laggederrors. We call this an ARIMA(p, d, q) model, where:

• p = order of the autoregressive part;

• d = degree of first differencing involved;

• q = order of the moving average part;

4

Page 5: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

Table 4. The RMSE of the ARIMA(2,1,1) model.Article Average RMSE RMSE %Article 1 42.59 12.08 28.36%Article 2 443.13 72.81 16.43%Article 3 103.02 18.05 17.52%Article 4 555.71 217.57 39.15%Article 5 517.41 241.07 46.59%Article 6 395.83 286.64 67.87%Article 7 201.54 52.83 26.21%Article 8 413.10 143.67 34.78%Article 9 384.54 162.37 42.22%Article 10 591.98 397.95 67.22%

Selecting appropriate values for p, d and q can be diffi-cult. Often the Autocorrelation Function (ACF) and Par-tial Autocorrelation Function (PACF) are plotted to deter-mine q and p, d = 1 because the series is differenced once[2]. In Appendix B, the ACF and PACF plots of article 10can be found as well as an explanation on how to pick theoptimal values for p and q. The implementation used inthis paper is ARIMA with p = 2, d = 1 and q = 1, becauseit was able to produce good results for all articles. It isimportant to note that it would be better to fit differentARIMA models for different time series. Using this imple-mentation the root mean squared errors (RMSE) of thearticles were found, see Table 4. The RMSE is only usefulwhen comparing the accuracy of multiple techniques, be-cause it is scale dependent [12]. Therefore the RMSE %reflects the average error percentage of the average piecessold in a week.

For all articles in the data set the forecasts can be foundin Appendix C. It is important to note that the stocklevels are also taken into account, assuming there is noinitial stock. This way one is able to see the impact on thestock if the purchasing department would buy accordingto the sales forecast. When starting with no initial stock,6 articles would go out of stock.

If one takes the implemented ARIMA model and the stockat the end of the last week of 2016, the forecasts showpromising results. Even though the forecast is somewhatoff, the stock levels remain more constant over time withthe forecasts than the actual stock levels in 2017. Theforecast of article 4, 6 and 10 can be found in Figure 5.Almost all articles reduced the amount of stock needed forthese articles. Therefore, if the forecasting model was usedas the only input for the purchasing department, costs ofstock would decrease substantially. All stock forecasts canbe found in the Appendix D.

There is a side note however, the model forecasts the salesof an article but does not take into account the differentvariations of the articles, such as colors and sizes. It alsodoes not forecast the distribution for hundreds of stores.This is because if all article variations and stores weretaken into account, forecasting would become very diffi-cult as the sales of a specific article in a specific storewould most of the time be zero. But despite the precision,this forecasting technique could help the retailer maintaina healthy inventory on a country or region wide level. Tobe able to make a forecast for all variations and stores,a more complex and complete sales forecasting techniqueneeds to be implemented, for example the (G)ELM tech-niques proposed by Xia et al. (2012) and Wong et al.(2010) [19, 18].

Figure 5. The actual and predicted sales, as wellas the stock levels for article 4, 6 and 10.

6.2 Model for retail sales forecastingThe aim of this research was to give retailers worldwide amodel or approach to tackle time series problems. Basedon the steps taken in the case study, a model has beencreated. The proposed model can be found in Figure 6and is partially based on the CRISP-DM model [17], amodel that is a data mining process model that describescommonly used approaches that data mining experts useto tackle problems. The following subsections discuss thesteps in the model in more detail. The goal of the modelis for retailers to gain business understanding that help inmaking important business decisions.

In order to be able to process the article data efficiently,retailers should construct a data warehouse containing allthe important and changing facts. This makes faster anal-ysis and on-line analytical processing (OLAP) possible,crucial for retailers with hundreds or even thousands ofarticles. Chaudhuri et al. (1997) provide an overview ofdata warehousing and OLAP technologies that retailerscould use [4]. The model is build on the assumption thatthere is a data warehouse in place before continuing.

5

Page 6: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

Figure 6. Proposed model to gain business knowl-edge using the most accurate forecasting tech-niques.

6.2.1 Data visualizationThe first step for each article is data visualization. Datavisualization is defined by Ward et al. (2010) as the com-munication of information using graphical representations.The benefit of using data visualization is that a graphicalrepresentation can contain a wealth of information, thatcan be processed much more quickly than a comparablepage of words [16]. What data and relations the retail-ers should visualize is based on the business domain andthe business questions that the retailer wants to answer.The insights of this step already helps the retailer gainingbusiness understanding.

6.2.2 Data transformation and preparationWith the data warehouse, the next step is the data trans-formation. The transformation phase is the first and rel-atively easy step in gaining stationary series. Having sta-tionary series are crucial for many time forecasting modelsand often a simple transformation could lead to a higherADF confidence. If the transformation of the time seriesdoes not result in high ADF stationary confidence, onecould use other techniques. Some of the techniques thatcould be used to increase stationarity are:

• Moving Average (MA)

• Exponentially Weighted Moving Average (EWMA)

• Differencing

As shown in Table 2 in section 5.2, every article requires adifferent technique for the optimal result. That is why thisstep in the model is recursive and the optimal techniqueis always used.

6.2.3 Forecasting techniquesAnother recursive step in the model is forecasting phase.Here one or multiple forecasting techniques could be im-plemented and tested to see which one works best for thegiven time series problem concerning an article. To com-pare different forecasting techniques one could use the rootmean squared error (RMSE). It is important to note thatone needs to transform the time series back to its original

scale to see the real forecasts. This is dependent on thechoices made at the data transformation and preparationphase.

6.2.4 EvaluationEven though the previous step automatically picks the op-timal forecasting technique. It is important to evaluate thesteps taken and the accuracy of the forecasts, because eventhough the optimal steps are taken, the results could stillbe unsatisfactory. The accuracy can be tested by combin-ing and comparing the sales forecast with the actual stocklevels to see the impact. Before proceeding to the deploy-ment of the model, retailers should thoroughly evaluatethe model and review the steps executed to construct themodel. At the end of this phase the retailer should be cer-tain that using these techniques, the business objectivesare achieved.

6.2.5 DeploymentWhen one knows which data preparation and forecastingtechniques work well for a given article, one can deploythe technique to support business decisions. Dependingon the requirements, the deployment phase could be gen-erating a report, but also more complex such as creatingan algorithm that the retailer could use to make forecasts.

7. CONCLUSIONIn this paper, the following research question is addressed:

How and to what extent can retailers prevent stockouts us-ing sales forecasting?

(RQ1) The most important details that could be identi-fied in the data set containing ten articles were the hugedifferences among the ten fashion articles. Whereas somearticles were very dependent on the discount rate, othersshowed almost no correlation. Some articles had a verypositive trend over the year, some a slightly negative. Thedata exploration made clear that the data visualizationis a crucial part in gaining business knowledge and beingable to make accurate forecasts. (RQ2) Most time seriesforecasting models require the data to be stationary. Inorder to test if a data set was stationary, the AugmentedDickey-Fuller (ADF) test was used [8]. To increase theconfidence of a stationary series, first a transformation wasused, for example the log of the data. This was used suc-cessfully to reduce the trend in the data, because highervalues are penalized more. After the transformation, othertechniques were used as well, such as the Moving Average(MA), Exponentially Weighted Moving Average (EWMA)and differencing. Combining the results of all possiblecombinations of transformations and techniques, showedthat every article requires a different technique to get thehighest ADF confidence. (RQ3) The Autoregressive Inte-grated Moving Average (ARIMA) model has been used tomake sales forecasts for the articles. The model showedpromising results as it performed quite well in forecast-ing sales. However, more advanced techniques proposedin the literature have the potential to outperform it andmake the forecast more reliable. (RQ4) The actual stocklevels and the forecast of the articles were taken into ac-count to see whether it could help in preventing stockouts.The results of the ARIMA forecast where different basedon the article. This can partially be explained becausejust the basic implementation of the ARIMA model wasused, without any optimization. Most forecasts howeverwould lead to a lower stock levels, which means lower in-ventory costs for the retailer. The forecasting techniquewould be insufficient if all article variations such as size,

6

Page 7: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

color and store would be taken into account. More ad-vanced techniques are required to make proper forecastson such a small scale. (RQ5) In this paper a model isproposed that takes all steps taken in this case study intoaccount. The model is adaptive, which means that it isable to pick the most accurate path for making time seriesforecasts. The model needs to be repeated for each article,because no article is the same and to make it compatiblewith retailers from different sectors. The model can easilybe implemented in the form of an algorithm and can beextended to use more techniques if required.

The used forecasting technique is far from perfect, theproposed model however, provides retailers with a methodto handle advanced time series problems in a concise andeasy to understand manner.

8. LIMITATIONSThe proposed model is based on the assumption that notone forecasting method fits all and that multiple shouldbe implemented to get optimal results. This holds true formaking the data stationary, but its not implemented andtested for forecasting. The model is tested on aggregatedweekly sales data across all stores in the case study andhas not been tested on a store level, or variants of thearticles.

9. FUTURE WORKBased on the limitations mentioned in the previous sec-tion, future work is needed to:

• validate the model with more different retailer datasets.

• improve the current sales forecasting technique.

• add new, more advanced forecasting techniques.

• create an algorithm that is able to make forecastsbased on the model.

• include article variants and aggregation techniques.

10. ACKNOWLEDGEMENTI would hereby like to thank Chintan Amrit from the Uni-versity of Twente for his help and guidance during theresearch period. I would also like to thank Erik van derKuil for peer reviewing my writings, providing useful feed-back.

11. REFERENCES[1] E. Breugelmans, K. Campo, and E. Gijsbrechts.

Opportunities for active stock-out management inonline stores: The impact of the stock-out policy ononline stock-out reactions. Journal of Retailing,82(3):215–228, 2006.

[2] P. Brockwell and R. Davis. Introduction to TimeSeries and Forecasting. 2002.

[3] K. Campo, E. Gijsbrechts, and P. Nisol. Dynamicsin consumer response to product unavailability: Dostock-out reactions signal response to permanentassortment reductions? Journal of BusinessResearch, 57(8):834–843, 2004.

[4] S. Chaudhuri and U. Dayal. An overview of datawarehousing and OLAP technology. ACM SIGMODRecord, 26(1):65–74, 1997.

[5] F. L. Chen and T. Y. Ou. Expert Systems withApplications Sales forecasting system based on Grayextreme learning machine with Taguchi method inretail industry. 38:1336–1345, 2011.

[6] C.-w. Chu and G. Peter. A comparative study oflinear and nonlinear models for aggregate retail salesforecasting. 86:217–231, 2003.

[7] D. Corsten and T. Gruen. Stock-Outs CauseWalkouts. Harvard Business Review, 82(5):26–28,2004.

[8] D. A. Dickey and W. A. Fuller. Distribution of theEstimators for Autoregressive Time Series With aUnit Root. Journal of the American StatisticalAssociation, 74(366):427, 1979.

[9] J. C. Ehrenthal and W. Stolzle. An examination ofthe causes for retail stockouts. International Journalof Physical Distribution & Logistics Management,43(1):54–69, 2013.

[10] Z.-P. Fan, Y.-J. Che, and Z.-Y. Chen. Product salesforecasting using online reviews and historical salesdata: A method combining the Bass model andsentiment analysis. Journal of Business Research,74:90–100, may 2017.

[11] R. Helm, T. Hegenbart, and H. Endres. Explainingcostumer reactions to real stockouts. Review ofManagerial Science, 2013.

[12] R. J. Hyndman and A. B. Koehler. Another look atmeasures of forecast accuracy. International Journalof Forecasting, 22(4):679–688, 2006.

[13] H. Lee, S. G. Kim, H. woo Park, and P. Kang.Pre-launch new product demand forecasting usingthe Bass model: A statistical and machinelearning-based approach. Technological Forecastingand Social Change, 86:49–64, 2014.

[14] S. T. Peinkofer, T. L. Esper, and E. Howlett. Hurry!Sale Ends Soon: The Impact of Limited InventoryAvailability Disclosure on Consumer Responses toOnline Stockouts. Journal of Business Logistics,37(3):231–246, 2016.

[15] L. M. Sloot, P. C. Verhoef, and P. H. Franses. Theimpact of brand equity and the hedonic level ofproducts on consumer stock-out reactions. Journalof Retailing, 81(1):15–34, 2005.

[16] M. O. Ward, G. Grinstein, and D. Keim. Interactivedata visualization: foundations, techniques, andapplications. CRC Press, 2010.

[17] R. Wirth and J. Hipp. Crisp-dm: Towards astandard process model for data mining. InProceedings of the 4th international conference onthe practical applications of knowledge discovery anddata mining, pages 29–39, 2000.

[18] W. K. Wong and Z. X. Guo. A hybrid intelligentmodel for medium-term sales forecasting in fashionretail supply chains using extreme learning machineand harmony search algorithm. InternationalJournal of Production Economics, 128(2):614–624,2010.

[19] M. Xia, Y. Zhang, L. Weng, and X. Ye. Fashionretailing forecasting based on extreme learningmachine with adaptive metrics of inputs.Knowledge-Based Systems, 36:253–259, 2012.

[20] W. Zinn and P. C. Liu. A comparison actual andintended consumer behaviour in response to retailstockouts. Journal of Business Logistics,29(2):141–159, 2008.

7

Page 8: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

APPENDIXA. STATIONARITY TABLEIn Table 5 the results of the Augmented Dickey-Fuller test (ADF) can be found if the optimal transformation and techniqueare used. Apart from article 4, all articles would reach a ADF confidence of more than 99%.

Table 5. Stationarity techniques and transformations used to reach the highest ADF confidence.Article ADF start Transformation (TR) Trend and seasonality (TS) ADF endArticle 1 70.5858146283 log MA 99.9640817419Article 2 93.3127505709 log MA 99.9999897913Article 3 99.999556628 log MA 99.9999999986Article 4 99.8696573623 as is MA 98.8683384923Article 5 51.4311988562 log MA 99.9311811191Article 6 86.2275066794 log EWMA 99.998564317Article 7 51.7030300125 as is MA 99.9999999881Article 8 99.8809438209 log MA 99.9999990141Article 9 99.3895165044 as is MA 99.9999264893Article 10 99.759231731 as is MA 99.9999993571

B. ACF AND PACF PLOTThe Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) of article 10 can be found in Figure 7.In this plot, the two dotted lines on either sides of 0 are the confidence intervals. The plot can be used to pick the optimalp and q values for the ARIMA(p, d, q) model [2]:

• p - The lag value where the PACF chart crosses the upper confidence interval for the first time.

• q - The lag value where the ACF chart crosses the upper confidence interval for the first time.

Figure 7. The ACF and PACF plot of article 10.

For article 10, this would mean ARIMA(1,1,1) would be optimal, assuming the series is differenced once (d = 1).

8

Page 9: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

C. FORECASTS WITHOUT INITIAL STOCKIn Figure 8 the forecasts without initial stock are visualized. The stock level is therefore defined as the sum of the salesforecast minus the sum of the actual sales to see what will happen over time.

Figure 8. Sales forecast and the effect on the stock level assuming there is no initial stock.

9

Page 10: A Model to Prevent Stockouts in Retail using Time …...A Model to Prevent Stockouts in Retail using Time Series Sales Forecasting M.W.T. Gemmink University of Twente P.O. Box 217,

D. ACTUAL VS. FORECASTED STOCK LEVELSIn Figure 9 the actual stock levels of the retailer are visualized, but also the stock levels if the retailer would decide topurchase exactly according to the sales forecast. The forecasts can be found in Appendix C.

Figure 9. Impact of the sales forecast on actual stock.

10