Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Chintan Gandhi Capstone Project
1
Project: Predictive Analytics Capstone
Task 1: Determine Store Formats for Existing Stores
• What is the optimal number of store formats? How did you arrive at that number?
Figure 1. Adjusted Rand Indices and Calinski-Harabasz Indices
Chintan Gandhi Capstone Project
2
Figure 2. Adjusted Rand Indices and Calinski-Harabasz Indices
Based on the observations of the AR and CH indices, the data can be segmented into 3, 4 or 5 clusters, since they have the higher median values.
Chintan Gandhi Capstone Project
3
Figure 3. IQ,Median,Max and Min values for 3, 4 and 5 clusters
From the values in Figure 3, the median values for 3 clusters is the highest. However, for 4 and 5 clusters the IQ range are lesser than that for 3 clusters. The differences for the median are much larger when compared to the IQ ranges for the clusters. Hence, the cluster model to be developed will contain 3 clusters.
• How many stores fall into each store format?
Figure 4. Cluster information obtained from the K-Means clustering model
From the K-means clustering model, Figure 4 shows cluster information. For each cluster shown, the size indicates the number of stores that fall in each segment.
• Based on the results of the clustering model, what is one way that the clusters differ from
one another?
Figure 5. Variance seen in each category in the clusters
From Figure 5, for the Dairy category it can be seen the value for cluster 2 is 0.70; cluster 1 is -0.76; and for cluster 3 is -0.01. The values for the 3 clusters being distinctly apart indicates that these three clusters differ in sales made in the Dairy category. For cluster 2 and cluster 1 would be most different and cluster 3 would be intermediate.
Chintan Gandhi Capstone Project
4
• Please provide a Tableau visualization (saved as a Tableau Public file) that shows the location of the stores, uses color to show cluster, and size to show total sales.
Figure 6. Cluster map of store segments
Chintan Gandhi Capstone Project
5
Task 2: Formats for New Stores • What methodology did you use to predict the best store format for the new stores? Why
did you choose that methodology? (Remember to Use a 20% validation sample with
Random Seed = 3 to test differences in models.)
Figure 7. Decision Tree, Forest Model and Boosted Model comparison
The Boosted Model gives an accuracy of 82.35%, The PPV is calculated to be 80% for cluster
1, 67% for cluster 2 and 100% for cluster 3. The F1 score is 88.89%.
The Decision Tree Model gives an accuracy of 70.59%, The PPV is calculated to be 60% for
cluster 1, 67% for cluster 2 and 83.3% for cluster 3. The F1 score is 76.85%.
The Forest Model gives an accuracy of 82.35%, The PPV is calculated to be 75% for cluster 1,
80% for cluster 2 and 87.5% for cluster 3. The F1 score is 82.35%.
The model to be used for classification will be the Boosted Model. It gives the best accuracy to
identify stores in segments for cluster 1 and cluster 3 compared to the other two models. Lastly,
it also has the highest F1 score.
Chintan Gandhi Capstone Project
6
• What format do each of the 10 new stores fall into? Please fill in the table below.
Store Number Segment
S0086 3
S0087 2
S0088 1
S0089 2
S0090 2
S0091 1
S0092 2
S0093 1
S0094 2
S0095 2
Figure 8. Segments identified for new stores
Chintan Gandhi Capstone Project
7
Task 3: Predicting Produce Sales 1. What type of ETS or ARIMA model did you use for each forecast? Use ETS(a,m,n) or
ARIMA(ar, i, ma) notation. How did you come to that decision?
Figure 9. Time Series decomposition plot
From figure 9, the graph will enable to determine the ETS model. By looking at the
seasonal plot the seasonality changes nominally in magnitude every year, not noticeable
in the graph seen. In order to account for this, a multiplicative method will be used. The
trend line is neither line linear nor quadratic, so no method will be used. The remainder
graph indicates error of varying magnitudes and a multiplicative method will be used.
The ETS model will be a MNM model.
Chintan Gandhi Capstone Project
8
Figure 9. ACF and PACF plots from Time Series decomposition
From figure 9, the ACF plot is not stationary and shows a seasonal pattern. The time
series will be differenced to obtain a stationary time series.
Chintan Gandhi Capstone Project
9
Figure 10. ACF and PACF plot for seasonal difference
From figure 10, the ACF plot is not stationary and further difference is done by taking a
seasonal first difference.
Chintan Gandhi Capstone Project
10
Figure 11. ACF and PACF plot for seasonal first difference
From the seasonal first difference seen in figure 11, the series is now stationary. For the
non-seasonal component at lag -1, it can be seen from the ACF plot that the correlation
is negative and then cuts off to zero. In the PACF plot, we can see plot gradually drops
to zero. This indicates MA 1 term, which is q =1. Also, since this is a first difference, the
value for d = 1. For the seasonal component at lag -12, in both ACF and PACF the
component has negative correlation and is significant. At lag -24, in both ACF and
PACF, the component cuts off to zero. This indicates a MA 1 term, where Q =1. Also,
since it is the first difference D = 1. The m value is 12 since seasonal duration is 12
months. The ARIMA model is ARIMA(0,1,1)(0,1,1)[12].
Chintan Gandhi Capstone Project
11
Figure 12. ETS MNM model errors
Figure 13. ARIMA(0,1,0)(0,1,1)[12] model errors
Looking at Figure 12 and Figure 13, the RMSE and MASE errors in ARIMA model are
smaller compared to the ETS MNM model.
Figure 14. Errors from both models from TS comparison
From figure 14, the ETS model gives less errors compared to the ARIMA model when
used to predict the holdout sample. The RMSE and MAPE values are lower in the ETS
MNM model when compared to the ARIMA model.
From the analysis done, the ETS MNM model would be used for forecasting future
values.
Chintan Gandhi Capstone Project
12
2. Please provide a table of your forecasts for existing and new stores. Also, provide
visualization of your forecasts that includes historical data, existing stores forecasts, and
new stores forecasts.
Figure 15. Forecasted Produce Sales for Existing and New Stores
Figure 16 Produce Sales historical and forecasted sales