View
239
Download
3
Category
Preview:
Citation preview
Bookbinder's Club Customer Choice Assessment
Team Watson
Simon CampbellFarah ChandrimaKirby DengBrandon HewittVivian KoMark Rousseau
Agenda1 •Background
2 •Key Issue
3 •Objective
4 •Analysis & Insights
5 •Recommendation
6 •Q&A
Team Watson
Background
•New distribution channels and business models (superstores and online)•These new channels have massive reach and selection
Evolving industry
•Competitors like amazon have robust customer analytics•Deep understanding of customers is key to success
Evolving competition
Team Watson
Key Takeaways:More traditional competitors must find new ways to compete or risk being eliminated
Analytics provides a powerful opportunity to enhance effectiveness
Key Issue
Key Issue:
• Intensifying competition driving a need for change
Challenge:
• Use predictive marketing models to improve effectiveness of direct mail efforts
Recommendation:
• Increase ROI by using our Customer Choice model • Build, test and implement models for all genres
Team Watson
Objective
Objective: identify the best model to improve customer targeting to drive profitability for BBBC
Increase Profit
Target most profitable customers
Run Best performing
modelTest Model
Build: Customer
Choice
Build: RFM
Build: Regression
Identify Customer
pool
Analysis – based on sample of 2,300
Key Takeaway:Customer Choice Model performs the best
10 20 30 40 50 60 70 80 90 1000
200
400
600
800
Profit
% of customers sorted by most likely to purchase
Profi
t ($)
10 20 30 40 50 60 70 80 90 1000
200
400
600
800
Profit
RFM
% of customers sorted by most likely to purchase
Profi
t ($)
10 20 30 40 50 60 70 80 90 1000
200
400
600
800
1000
1200
Profit
RFMLinear Regression
% of customers sorted by most likely to purchase
Profi
t ($)
10 20 30 40 50 60 70 80 90 1000
200
400
600
800
1000
1200
Profit
RFMLinear RegressionCustomer Choice
% of customers sorted by most likely to purchase
Profi
t ($)
Insights – based on sample of 50,000
Powerful variables– Last Purchased (Less recent implies more likely to
purchase)– # of Art history book purchased (+)– # of children’s book purchased (-)– # of cook books purchased (-)– # of DIY books purchased (-)
Profit$0
$5,000
$10,000
$15,000
$20,000
$25,000
$13,553 $13,988
$22,102 $23,149
Profit
RFM
Customer ChoiceRegres-
sion
63%
BASE
3 %
71% Lift over base Customer Choice Model
- Sort customer from highest likelihood to purchase - Target the top 40% of your customers who are most
likely to purchase
Recommendation
Update model on an ongoing basis
Utilize existing data and apply similar model for all genres
Use Customer Choice Logit Model for direct mail campaign
Team Watson
THANK YOU!
QUESTIONS?
APPENDIXModel details
Appendix 1 - Detail analysis of each modelsBase RFM Customer Choice Regression
# of Customers 50000 50000 50000 50000
Target (%) 100.00% 90.00% 40.00% 50.00%Expected response rate 9.03% 9.42% 17.72% 15.04%Total variable cost
$130,701.25 $121,448.25 $90,082.00 $98,030.00
Revenue$144,254.
25 $135,436.05 $113,230.80 $120,132.00
Profit$13,553.0
0 $13,987.80 $23,148.80 $22,102.00
Delta over base 0 3.21% 70.80% 63.08%
Appendix 2 - RFM ModelRFM(Recency, Frequency, Monetary)- Assumes customer who buy more
recently, more frequently and spend more money will be most likely customers to accept a new product offering
- The expected patterns for the RFM model were not apparent in this sample data. Non-buyers of the Art History book actually had higher mean RFM recency and frequency scores than buyers
- There were no expected patterns in the response rates for the Art History book purchase among higher RFM groups, the highest response rates were cut across most RFM segments.
Insights
Advantages- Simple to use- Easy to interpret- Easy to manageDisadvantages- Hard to test for accuracy- The model is not forward looking- Limited amount of variables used, easy to lose
sight on key insights about your customer and what drives their choice
RFM Model – Decile performance
Deciles# cus in deciles Purchases Cost of mail
Overhead of book
purchasedBook
purchased Total v cost Revenue Profit RFM
10 230 23 149.5 155.25 345 649.75 734.85 85.1
20 460 36 299 243 540 1082 1150.2 68.2
30 690 53 448.5 357.75 795 1601.25 1693.35 92.1
40 920 74 598 499.5 1110 2207.5 2364.3 156.8
50 1150 97 747.5 654.75 1455 2857.25 3099.15 241.9
60 1380 124 897 837 1860 3594 3961.8 367.8
70 1610 147 1046.5 992.25 2205 4243.75 4696.65 452.9
80 1840 163 1196 1100.25 2445 4741.25 5207.85 466.6
90 2070 195 1345.5 1316.25 2925 5586.75 6230.25 643.5
100 2300 204 1495 1377 3060 5932 6517.8 585.8
RFM Lift Curve
10 20 30 40 50 60 70 80 90 100 1100
200
400
600
800
Profit
RFM
Deciles
Cum
ulati
ve P
rofit
($)
Appendix 3 - Logit ModelCustomer Choice Logit Model
• Logit model: The logit model we used on our sample data, we were able to predict correctly about 81% of the cases- observed choice is equal to the predicted choice. In this model with all variables included we noticed that we were able to accurately predict cases where the person didn’t purchase the book (91% hit rate) vs person did purchase the book (which was at 40%).
• When we used the model from our sample to the hold out data (2300 record), we were able to predict 2050 out of 2300 accurately- 89%. And out of the 89% we were able to predict 37% choose accurately and for the people that didn’t pick we were about to predict 94% of them. We had very similar results with keeping first purchase in the logit model. Notice that for our rate of predictability, we picked greater than 0.5. This is because we are going with the assumption than less than 0.5 implies that these individuals are on the fence of purchasing the book and leaning more towards purchasing than not purchasing.
• We looked at the variables and decided to take out first purchase since the coefficient wasn’t significant and ran the model to see if we get a better result. And the model only increased slightly not significantly.
Interpretation of Results• Our observation of the analysis of the data was that last
purchase and # of art book purchased had a significant positive impact on the customer’s choice to purchase or not purchase and the following variables had significant negative impacts:
– Gender (male meant less likely to purchase)– Frequency– # of children’s book purchased– # of youth books purchased– # of cook books purchased– # of DIY books purchased.
• In the appendix we have details of the effect of each purchase (in terms of elasticity) but here are some of the items that had greater elasticity than others. Increase of 10% to last purchase will mean an increase of 9.9% increase to likelihood of purchasing the book. Or increase of 10% in historical purchase of art books will mean an increase of 1.9% to our purchase of the book. Increased in frequency by 10% has a 5% decrease to likelihood of purchase.
Logit Model
Confusion Matrix
Observed / Predicted Choice Response
Dummy (No
Choice)
Response 161 78
Dummy (No Choice) 239 1122
Coefficients
Variables / Coefficient estimates Coefficient estimates
Gender -0.86606Amount purchased 0.001836Frequency -0.09033Last purchase 0.553669P_Child -0.81818P_Youth -0.64249P_Cook -0.93301P_DIY -0.91011P_Art 0.664337Const-1 -0.28339Baseline
Logit Model
Advantages• Robust model since it allows us to
include several discriminant variables• Great tool to do prediction and analysis
when the dependant variable is categorical (in our case 1,0)
• It is flexible as we do not have to meet the linear regression assumptions
Disadvantages• Can be difficult to use (most time
consuming out of all three models to build)
• It requires more data to get better results
Logit Model Decile Performance
Decile# cus in deciles
# cus purchased
Portion of cus
purchased in %
Cost of brochures
OH Bk purchased
BK purchased Total cost Revenue Profit
10 230 84 41.18% $149.50 $567.00 $1,260.00 $1,976.50 $2,683.80 $707.30
20 460 121 59.31% $299.00 $816.75 $1,815.00 $2,930.75 $3,865.95 $935.20
30 690 144 70.59% $448.50 $972.00 $2,160.00 $3,580.50 $4,600.80 $1,020.30
40 920 163 79.90% $598.00 $1,100.25 $2,445.00 $4,143.25 $5,207.85 $1,064.60
50 1150 174 85.29% $747.50 $1,174.50 $2,610.00 $4,532.00 $5,559.30 $1,027.30
60 1380 182 89.22% $897.00 $1,228.50 $2,730.00 $4,855.50 $5,814.90 $959.40
70 1610 191 93.63% $1,046.50 $1,289.25 $2,865.00 $5,200.75 $6,102.45 $901.70
80 1840 199 97.55% $1,196.00 $1,343.25 $2,985.00 $5,524.25 $6,358.05 $833.80
90 2070 203 99.51% $1,345.50 $1,370.25 $3,045.00 $5,760.75 $6,485.85 $725.10
100 2300 204 100.00% $1,495.00 $1,377.00 $3,060.00 $5,932.00 $6,517.80 $585.80
Logit Lift Curve
0 20 40 60 80 100 120$0.00
$200.00
$400.00
$600.00
$800.00
$1,000.00
$1,200.00
Profit
Deciles
Cum
ulati
ve P
rofit
($)
Appendix 4 - Regression Model (model 1)• BASIC ASSUMPTION 1 - Linear Model is appropriate: Checking the distribution of the data, it
is clearly not normally distributed. The data is a bimodal distribution. Our dependent variable has 2 possible values: 0 and 1 for choice or no choice. This means we can run a linear regression, however we will have to transform the coefficient results of our coefficients.
Regression Statistics: Model 2 for Choice__0_1 (4 variables, n=1600)
R-Squared Adj.R-Sqr. Std.Err.Reg. Std. Dev. # Cases # Missing t(2.50%,1595) Conf. level
0.200 0.198 0.388 0.433 1600 0 1.961 95.0%
Coefficient Estimates: Model 2 for Choice__0_1 (4 variables, n=1600)
Variable Coefficient Std.Err. t-Stat. P -value Lower95% Upper95% Std. Dev. Std. Coeff.
Constant 0.412 0.024 17.143 0.000 0.365 0.459Frequency -0.011 0.001240 -8.866 0.000 -0.013 -0.008562 7.841 -0.199Gender -0.125 0.020 -6.109 0.000 -0.165 -0.085 0.474 -0.137P_Art 0.216 0.013 16.048 0.000 0.190 0.243 0.735 0.367P_Cook -0.048 0.009506 -5.000 0.000 -0.066 -0.029 1.040 -0.114
Regression Model (model 2)• Frequency, Gender, Art and Cook have the highest t-values and high levels of significance. We simplify
our regression model and include only these variables. The r-square and adjusted r-squared have only decreased slightly. All variables are still highly significant and have increased their t-values.
• BASIC ASSUMPTION 2 – Residuals follow a normal distribution: • The residuals are not normally distributed, however they have improved in normality from model 1 to
model 2
Regression Statistics: Model 2 for Choice__0_1 (4 variables, n=1600)
R-Squared Adj.R-Sqr. Std.Err.Reg. Std. Dev. # Cases # Missing t(2.50%,1595) Conf. level
0.200 0.198 0.388 0.433 1600 0 1.961 95.0%
Coefficient Estimates: Model 2 for Choice__0_1 (4 variables, n=1600)
Variable Coefficient Std.Err. t-Stat. P -value Lower95% Upper95% Std. Dev. Std. Coeff.
Constant 0.412 0.024 17.143 0.000 0.365 0.459Frequency -0.011 0.001240 -8.866 0.000 -0.013 -0.008562 7.841 -0.199Gender -0.125 0.020 -6.109 0.000 -0.165 -0.085 0.474 -0.137P_Art 0.216 0.013 16.048 0.000 0.190 0.243 0.735 0.367P_Cook -0.048 0.009506 -5.000 0.000 -0.066 -0.029 1.040 -0.114
Summary of results• The variables Frequency, Gender, P_Art, and P_Cook are all
highly significant in explaining a customer’s choice to buy. We will use these variables and our model (described below) to test against the holdout sample and calculate for logit odds
• Our model is: Y hat = 0.412 + (-0.011)x1 + (-0.125)x2 + 0.216x3 + (-0.048)x4
• To convert these results to probability we must use the formula: logitodds/(1+logitodds) this gives us the probability of customer purchase.
• All values greater than 0.5 were considered a predicted purchase. We then matched predicted choice (purchase or not purchase) against the actual choice outputs. We had only 388 correct matches, or a 17% accuracy. The table below demonstrates this process.
• About 20% of the variability in a customers’ choice to buy or not buy can be explained by our model. P_Art and Frequency were the two factors that had the most influence on a customers’ decision to buy.
• Advantages- Easy to interpret- Tells us how each independent variable
influences the dependant variable- Works well with continuous dependant
variable (not an advantage for our case)
• Disadvantages– Does not work well with discrete
variables (in our model, Choice was 1 or 0 (purchase or not purchase)
Regression Model Decile Performance
DecileTotal
observationsPurchase at
decile# people
purchasedCost of
brochuresOH Bk
purchasedBK
purchased Total cost Revenue Profit
10 230 38.73% 79 $149.50 $533.25 $1,185.00 $1,867.75 $2,524.05 $656.30
20 460 55.88% 114 $299.00 $769.50 $1,710.00 $2,778.50 $3,642.30 $863.80
30 690 66.67% 136 $448.50 $918.00 $2,040.00 $3,406.50 $4,345.20 $938.70
40 920 76.96% 157 $598.00 $1,059.75 $2,355.00 $4,012.75 $5,016.15 $1,003.40
50 1150 84.80% 173 $747.50 $1,167.75 $2,595.00 $4,510.25 $5,527.35 $1,017.10
60 1380 89.22% 182 $897.00 $1,228.50 $2,730.00 $4,855.50 $5,814.90 $959.40
70 1610 92.65% 189 $1,046.50 $1,275.75 $2,835.00 $5,157.25 $6,038.55 $881.30
80 1840 96.57% 197 $1,196.00 $1,329.75 $2,955.00 $5,480.75 $6,294.15 $813.40
90 2070 99.51% 203 $1,345.50 $1,370.25 $3,045.00 $5,760.75 $6,485.85 $725.10
100 2300 100.00% 204 $1,495.00 $1,377.00 $3,060.00 $5,932.00 $6,517.80 $585.80
Regression Model Lift Curve
0 20 40 60 80 100 120$0.00
$200.00
$400.00
$600.00
$800.00
$1,000.00
$1,200.00
Profit for each model
Deciles
Profi
t ($)
Recommended