30
Decision Support Systems Final Project May 5, 2016 Seoul Retail Case Analysis, Page 1 Seoul Retail Case Decision Support Systems Final Project

Predictive Analytics for Seoul Retail case study

Embed Size (px)

Citation preview

Page 1: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 1

Seoul Retail Case Decision Support Systems Final Project

Page 2: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 2

ANALYSTS KUMAR RADHAKRISHNAN

SHWETA VAIDYA

SNEHA SALIAN

SNEHAL DATTA

Page 3: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 3

Case study and the approach taken to solve it:

In the given case study, Mr. Choe has collected aggregated sales data on five franchise stores of company Q. It is assumed that 95% of the sales come from Japanese tourists and along with it there are factors like Yen/won ratio, discount, weekdays/holidays, weather and location that might affect the sales at different stores. As a franchisee of Store B, Mr. Choe, is majorly interested in identifying the factors that affect sales in his store and the extent to which these factors affect sales along with identifying the similarities and dissimilarities of these impacted factors across the five stores.

In order to analyze the factors affecting each store and comparing them against each other and to store B, first, we divided the entire data set into five data sets, each for one specific store.

The variables No of customers, No of items, Store Code, Store Name, Average sales per item/customer were rejected as they are insignificant in the analysis to be conducted. Also, the variables that capture the distances of the store from the metro and the main street are rejected as they will be the same for all the records in the divided data set except for Store C as the location of the store changed midway. The role of Total Sales is chosen as the target variable and the level has been kept to Interval so as to interpret the results in terms of the average sales.

In order to handle the missing-ness of the data, while building the models, we have imputed the variables using the Tree method and the skew-ness was handled by transforming the variables. Log transform was used to handle positively skewed data where square transform was used to handle negatively skewed data.

To analyze this case study, we have used Decision Trees and Linear Regression models as the factors affecting sales at each store had to be

Page 4: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 4

determined. Neural network models are purely used for prediction purposes and the model does not help in drawing inferences. Also, Interactive trees are majorly used when we want the predictions to enforce certain business processes/rules/standards that have been followed for a long time which is not the case in this case study. Hence, both Neural network models and interactive trees have not been used for comparison and analysis. The use of Decision trees and Regression models have helped us in drawing conclusions about the different factors affecting sales across all the stores.

1. INDIVIDUAL STORE ANALYSIS

The models that we have built for all the stores have taken ‘Total Sales’ as our target variable and other variables are selected as Input and Rejected as shown below:

Page 5: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 5

• BUILDING DECISION TREES

We have built the default decision tree and the 4-way decision tree for all the stores and checked the performance of the two as to which one can be more suitable to predict the factors affecting sales in each store

• BUILDING REGRESSION MODELS

Before building the regression models, we had to perform a couple of steps such as imputing and transforming to account for missing data and for the skew-ness of the data

Imputed variable

The Japanese tourist variable has a lot of missing values which have been imputed using the Tree method.

Transform variables:

The following transformations have been done to the variables:

• Discount variable is skewed to the right. Hence, we apply log transform on that particular variable and continue using it

• IMP_Japanese_Tourists variable is skewed to the right. Hence, we apply log transform on that particular variable and continue using it

• YenWonRatio variable is skewed to the left. Hence, we apply a square root transform on that particular variable and continue using it (Initially we had applied square transform on that variable but that did not reduce the skewness much)

Before applying the transform

Page 6: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 6

Comparison between three regressions:

We built three regression models - Forward, Backward and Stepwise for each store. On the comparison of the three regression analysis, the best performing model was selected. We then proceeded further taking this best regression model and compared it against the decision trees. Eventually, the best model was used for analysis but in some cases we even took significant inputs from the other model for determining the different factors affecting the sales.

Page 7: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 7

STORE A:

Final Model Selection

Explanation for Store A:

It is to be noted here that the performance of the regression model is better than that of the 4-Way decision tree. The decision tree has not been an apt model for this case and has been ignored by the software. Hence, we take the regression model as our final model and give our final results based on this model. However, we also observe that the factors influencing Total Sales, that is Japanese Tourists, YenWonRatio, Weekday and Year are predicted by both the 4 –way decision tree and the regression model. In this case we just require the factors and the importance of such factors, so we can go ahead with the

Page 8: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 8

regression model. As we additionally do not need to form any marketing strategies or recommendations, we do not need to consider the decision tree.

Factors predicting Sales for Store A:

(1) We can see that the factors LOG_IMP_ Japanese Tourists, SQRT_YenWonRatio and Weekday (especially Tuesday or Saturday or Sunday) have a highly significant impact on the Total Sales for Store A

(2) For 1 unit increase in the square root of YenWonRatio, the Total Sales increases by 23530836

(3) For 1 unit increase in the log of the imputed variable for Japanese Tourists, the Total Sales is increased by 5973152

Page 9: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 9

(4) If the day of the week is Saturday , the Total Sales is increased by 3332584

(5) If the day of the week is Sunday, the Total Sales is increased by 3484581

(6) If the day of the week is Tuesday , the Total Sales is decreased by 3046056 whereas if the day of the week is Thursday, the Total Sales is decreased by 1442564

Final Analysis for Store A:

From the above model we can conclude that for Store A, the sales are higher on weekends. In addition to that, the times when more Japanese tourists visit the store and the Yen/won ratio is high, higher sales are noted. Also, during the mid week, the store does not perform well in terms of the sales.

Page 10: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 10

STORE B:

Final Model Selection

Explanation for Store B:

It is to be noted here that the performance of the Default decision tree is better than that of the 4-Way decision tree. The regression analysis has not been an apt model for this case and has been ignored by the software. Hence, in order to depict the factors that govern the sales of this store, we take the decision tree as our final model and give our analysis drawing inferences from it.

Page 11: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 11

Factors predicting Sales for Store B:

(1) When the Yen-Won ratio is higher than 13.39 and when there are a lot of Japanese tourists visiting the store on a long weekend(Friday to Monday), it leads to higher average sales (13193327).

(2) When the number of Japanese customers visiting is less than 2697 but more discount (greater than 1601845) if offered by the store, the highest average sales (15431388) are recorded. Even with lesser discount (less than 1601845) there is quite a comparable amount of sales happening (10895878).

(3) During the other days of the week, the month factor majorly affects the sales at this store. Like apart from January, February, June and July the Store makes higher average sales when the yen/won ratio is high(11099218).

Page 12: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 12

Final Analysis for Store B:

From the above model, we can conclude that Store B’s highest average sales are recorded when Japanese tourists visit on longer weekends. If the Japanese tourists visiting are less, the Store offers discounts to increase the average sales. Also, on normal weekdays average sales are highly dependent on the monthly fluctuation in yen/won ratio. Months where the yen/won ratio is high, higher sales are recorded in this store.

Page 13: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 13

STORE C

For this store, apart from the variables selected as input for the rest of the models, we have even selected the variables capturing the distances of the store from metro stations and main street in feet as input because the location of this store was changed after new ownership. So, this factor also needs to be considered for drawing inferences.

Final Model Selection

Page 14: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 14

Explanation for Store C:

It is to be noted here that the performance of the 4-way decision tree is better than that of the default decision tree. The regression analysis has not been an apt model for this case and has been ignored by the software. Hence, in order to depict the factors that govern the sales of this store, we take the decision tree as our final model and give our analysis drawing inferences from it.

Factors predicting Sales for Store C:

(1) When the distance between the Main street and the store C is less than 787 and the discount offered greater than 3508700, high average sales (13672987) are captured.

(2) When the distance between the main street and store C is greater than 1030 high average sales are captured in the months of February and

Page 15: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 15

September where the number of Japanese tourists visiting are high (more than 4352).

(3) In the same branch as above, there are high average sales captured in the months of October and November.

Final Analysis for Store C:

From the above model, we can conclude that Store C’s highest average sales were recorded when the Store’s location was changed after new ownership. This change must have attracted a lot of Japanese customers as the store’s location was moved closer to the main street making it easier for the tourists to commute to the store. It seems like the Store at this location also started giving out discounts to increase the average sales.

The store at the previous location encountered high average sales in months of February and September. These months mark the change in seasons; like during February spring kicks and during September Fall kicks in, which might have attracted a lot of Japanese tourists for season change shopping. This might be the reason for high average sales being captured during those months.

Also, this store attracts decent end of the year average sales.

Page 16: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 16

STORE D

Final Model Selection

Explanation for Store D:

It is to be noted here that even though on model comparison the default decision tree performs better than the stepwise regression, we can see from the graph plot that the stepwise regression performs much better than the default decision tree over the first 20th percentile. Hence, we will look at significant factors in the regression model and into the details of those factors through the decision tree.

Page 17: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 17

Factors predicting Sales at Store D:

(1) We can see that the factors LOG_IMP_ Japanese Tourists, LOG_Discount, Month, SQRT_YenWonRatio and Weekday (especially Saturday and Sunday) have a highly significant impact on the Total Sales for Store A

(2) For 1 unit increase in the square root of YenWonRatio, the Total Sales increases by 7245711

(3) For 1 unit increase in the log of the imputed variable for Japanese Tourists, the Total Sales is increased by 4299398

(4) For 1 unit increase in the log of the discount variable, the Total Sales increases by 917073

(5) For each unit of progression towards the end of the year (Month variable), the Total Sales increase by 182209

Page 18: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 18

(6) If the day of the week is Saturday, the Total Sales is increased by 2597662

(7) If the day of the week is Sunday, the Total Sales is increased by 1935091

(7) When the outlook is snowy, the Total Sales is decreased by 736506

We can see from the decision tree that it also considers the same factors for predicting Total Sales of Store D, that is, YenWonRatio, Japanese Tourists, Discount, Month, Weekday. The only difference from the regression model is that is does not consider Holiday and Outlook and instead uses AverageHighTemp.

Final Analysis for Store D:

Store D attracts the highest average sales on weekends when the yen/won ratio is high. Also, it attracts a large number of tourists on weekends when the temperature is high. So, basically when the days are sunny, there are more

Page 19: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 19

tourists visiting the store on weekends contributing to high average sales whereas when the weather is snowy the store sees a decline in sales.

This store offers discounts on long weekends which contributes to high average sales. Also, the store even offers discounts when the yen/won ratio is less which causes decent amount of average sales. Also, this store does not perform well during the mid week with respect to sales.

Page 20: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 20

STORE E:

Final Model Selection

Explanation for Store E:

It is to be noted here that the performance of the decision tree is better than that of the 4-Way decision tree. The regression analysis has not been an apt model for this case and has been ignored by the software. Hence, in order to depict the factors that governs the sales of this store, we take the decision tree as our final model and give our final results as part of it.

Page 21: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 21

Factors predicting Sales for Store E:

(1) When the discount at this store is very high (higher than 116490), high average sales are encountered

(2) During the months of October, November, February, March and April, the store captures high average sales by offering moderate discounts (between 2250 and 890465) to the Japanese tourists visiting the store

(3) The average sales are high when more Japanese tourists(higher than 25165) visit the store during the long weekends. Also, the store garners moderate average sales when moderate discounts(between 5650 and 890465) are offered to these tourists when they visit during non holidays

Final Analysis for Store E:

Store E is majorly dependent on discounts for high average sales. They seem to be following the discount strategy for attracting customers and generating sales.

Page 22: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 22

Not only do they give high discounts to attract more tourists customers during long weekends, they even seem to be offering discounts on weekdays to increase their customers and in turn increasing the sales.

This store also seems to be giving moderate discounts on specific months like February, March, April and October, November. This seems to the end of season sale in which they might be offering old products on sale/discount before new season kicks in. Also, this store does not seem to be performing well in the months of January, September and December, the reason might be the weather conditions (Snow and Rains).

2. How factors differ across stores and how in particular do they compare against the key factors for Store B

While Store A’s average highest sales happen during weekends, Store B’s highest average sales happen during longer weekends. Like Store B, Store D also encounters high average sales on longer weekends but it offers discounts as well. So Store D seems to be a competitor to Store B. Also note, the distance from Store D to the metro station is less as compared to Store B. In order to stay ahead in competition, Store B can come up with exclusive discount strategies to attract more customers. But Store D does not perform well on weekdays. So this store too can come up with some discount strategies to attract more customers during the week.

Store C is benefiting by changing its location closer to the main street which seems be increasing its average sales. Plus, they seem to be attracting customers by offering discounts. Store E is experiencing high average sales majorly through discounting strategies. But it is not performing well in certain months like January, September and December, where whether might be the reason (snow and rains).

Page 23: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 23

While Store A and B are the only stores who are majorly benefited by the high yen/won ratio, Store D is benefited by good climatic conditions on weekends.

While Store B offers discounts when Japanese customers are less so as to increase the average sales by attracting customers, Store D offers discounts on long weekends to do so whereas Store E offers discounts not only during weekends but also during season ends.

Apart from individually analyzing the stores and comparing the different factors affecting sales across all the stores, we even analyzed the entire data set to see if any other factors arise that contribute to the sales. So to do this, we considered Distance from the metro stations in feet and distance from the main street in feet to see whether the location factor contributes to the overall sales or not.

After running the decision trees, we could see that the first splitting criteria was the distance from the metro station Y. In spite of Store D being closest to the metro station Y, the highest average sales was not captured in this branch but in the Store A branch which turns out to be the second closest to the metro station. On further evaluation of the tree, the next spitting criteria on Store A’s branch was Distance from the main street.

Now as per the guesstimates, as 95% of the sales are contributed by the Japanese tourists, it can be assumed that they might be taking trains as their mode of transport for traveling. They must be finding it convenient to visit stores that are nearer to the metro station. Also, after trains the next mode of transport that comes to use might be buses. Usually, the buses have stops located close by to the main streets to make it easier for people to commute. Store A is second closest to the metro station and closest to the main street whereas though Store D is close to the metro station is pretty far from the main street which means the tourists might have to walk that much distance. This might have been the reason for the higher average sales to fall in this branch rather than the Store D’s branch

Page 24: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 24

as the tourists must find it easier to commute through public transport to reach Store A than store D.

3. Impact of the change in ownership at Store A and change in ownership

and location at Store D

Store A: Since Japanese Tourist Data is available only from Feb 14 2012, we have imputed this variable in Excel file using the mean method and then split the data into two data sets, one, based on the closing date and the other based on reopening date with the new ownership to compare the two. In order to see the impact of this change, we have considered total sales as our target variable to see the change in sales on change in ownership and the factors affecting this change. Models: Similar to individual store comparison, here too we have used only Decision trees and Regression models in our analysis for similar reasons discussed earlier. Data: Store A – Old Ownership Final Model Selection

Page 25: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 25

Explanation: It is to be noted here that on model comparison the backward regression performs better than the decision tree, we can see from the graph plot that the backward regression performs much better than the default decision tree over the first 20th percentile. But, we will look at significant factors in the regression model and into the details of those factors through the decision tree to make our analysis. Factors Predicting Sales:

From the decision tree we can draw the inference that high average sales are captured on weekends.

Page 26: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 26

. (1) We can see that the factors such as months (July and March) and Weekends (Saturday and Sunday) have a highly significant impact on the Total Sales for Old Store A (2) In the month July, the Total sales are expected to increase by 3922360 whereas in the month of March, the Total sales are expected to increase by 2918128 (3) On a Saturday, the Total sales are expected to increase by 3156712 whereas on a Sunday, the Total sales are expected to increase by 3649853 (4) On a Tuesday, the Total sales are expected to decrease by 4551638 whereas in the month of January, the Total sales are expected to decrease by 5166529

Page 27: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 27

Final Analysis: From the above models, we can conclude that for the Old Store A, the sales are higher on weekends. Also, during the mid week, the store does not perform well in terms of the sales. While the store performs well in the months of March and July, the sales fall during the months of January and October.

Data: Store A – New Ownership Final Model Selection:

Explanation: It is to be noted here that on model comparison the stepwise regression performs better than other regression models and decision tree. We can see

Page 28: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 28

from the graph plot that the stepwise regression performs much better than the default decision tree over the first 20th percentile. But, we will look at significant factors in the regression model and into the details of those factors through the decision tree to make our analysis. Factors affecting Sales:

From the decision tree we can draw the inference that high average sales are captured when there are more number of Japanese tourists visiting (more than 3142). Also, in the new store, the high yen/won ratio is also contributing to high average sales when the Japanese tourists less than 3142.

Page 29: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 29

(1) We can see that the factors such as Weekends (Saturday and Sunday), Yen/won ratio and Japanese tourists have a highly significant impact on the Total Sales for New Store A (2) With 1 unit increase in the Japanese Tourists, the Total sales are expected to increase by 745.7 (3) With 1 unit increase in the log value of Yen/Won ratio, the Total sales are expected to increase by 132820 (4) On a Saturday, the Total sales are expected to increase by 3425798 whereas on a Sunday, the Total sales are expected to increase by 3820310 (5) On a Tuesday, the Total sales are expected to decrease by 2155700 whereas on Thursday, the Total sales are expected to decrease by 2556596 Final Analysis: Irrespective of the new ownership, the high average sales on weekends and the low sales on weekdays are constant. But, the store after new ownership seems to be attracting more Japanese customers. The change in ownership might have

Page 30: Predictive Analytics for Seoul Retail case study

Decision Support Systems Final Project May 5, 2016

Seoul Retail Case Analysis, Page 30

improved the customer service which attracted more customers. Also, because of the increase in Japanese tourists visiting the store, the yen/won ratio is also contributing to higher sales. Store C: When we analyzed the data for Store C in question 1, the distinguishing factor was the distance(location) which indicates that sales at this store have been majorly affected by the new location and ownership. The change in ownership and location seems to have positively impacted the total sales at this store.

Final Analysis:

From the models generated, we can conclude that Store C’s highest average sales were recorded when the Store’s location was changed after new ownership. This location change must have attracted a lot of Japanese customers as the store’s location was moved closer to the main street making it easier for the tourists to commute to the store. It looks like the new Owner at Store C came up with discount strategies to increase the average sales.

The store at the previous location encountered high average sales in months of February and September. These months mark the change in seasons; like during February spring kicks and during September Fall kicks in, which might have attracted a lot of Japanese tourists for season change shopping resulting in higher sales. This might be the reason for high average sales being captured during those months. Also, the old store attracts decent end of the year average sales. As opposed to the Old Store C, the new Store C seems to be performing better due to the location and discount factors.