BADM Project Hyper Market Classifying Biscuit Brand ... B4...¢  Classifying Biscuit Brand Switchers

  • View
    1

  • Download
    0

Embed Size (px)

Text of BADM Project Hyper Market Classifying Biscuit Brand ... B4...¢  Classifying Biscuit Brand...

  • Group B4 - Minesweepers Aditi Vaish | Pranav Maranganty | Kevin John | Deepak Agnihotri | Archana Rajan

    BADM Project – Hyper Market

    Classifying Biscuit Brand Switchers for Targeted Marketing for a Biscuit Manufacturer

  • Business Problem • Client Profile and Background

    • Minesweepers Biscuits (MSB) based out of Denmark

    • Renowned brand internationally but limited brand presence in India

    • Expansion to India with new products and innovative promotions

    • Competition

    • Brands like Britannia and Parle own a big pool of loyalists

    • Business Objective

    • Increase Trial Rate of MSB products and hence Market Share by partnering with Hyper Market

    • Improve Marketing Efficiency by targeting only the “Brand Switchers”

    • Offer samples and personalized promotions at checkout counters or at kiosks of the Hyper Market to the “Brand Switchers”

    All Consumers

    Selected Consumers for better ROI on marketing

    0

    2000

    4000

    6000

    8000

    10000

    Number of Enrollments

  • Data Mining Problem • Objective

    • Classify new customers as “Brand Loyalist” (1) and “Brand Switchers” (0) based on demographics and purchase patterns in “Ready Food”

    • New Customer – A customer who has made 2 purchases from the “Ready Food” department

    Business Rules

    Classification

    Continuous Evaluation and Classification

    MSB’s Target

    “Brand Loyalist”: A person who has purchased biscuits at least 3 times and purchased the same brand over 50% of the times

  • Data • Key Inputs ( Vary depending on the various models)

    • Output

    • Loyalists? (0 – Brand Switcher , 1 – Brand Loyalist)

    Customer Demographics

    Historical Purchase Pattern ( Ready Food)

    Last Purchase (Ready Food)

    Second Last Purchase (Ready Food)

    Age Sex Marital Status Enrollment Store

    Average Basket Price Average Basket Quantity Average Basket Unique Count Number of Baskets Standard Deviation of Basket Price Standard Deviation of Basket Quantity

    Quantity Price Unique SKU Count

    Quantity Price Unique SKU Count

    Data Partition

    Training Set : 2172 Validation Set : 1303 Test Set: 868 Hold Out for Model Evaluation: 500

    Data

    Total Unique Customer: 4843 Brand Loyalists : 2886 (59.6%) Brand Switchers: 1957 (40.4%)

    A person who has purchased biscuits at least 3 times

    Initial Classification: A person who has purchased biscuits at least 3 times and purchased the same brand over 50% of the times

  • Methods

    CART Pruned Tree : 3 Nodes # Input Variables: 18

    Naïve Bayes # Input Variables : 6

    Class # Cases # Errors % Error

    0 524 208 39.69

    1 779 334 42.88

    Overall 1303 542 41.60

    Class # Cases # Errors % Error

    0 905 241 26.63

    1 1267 456 35.99

    Overall 2172 697 32.09

    Training Set Test Set

    Class # Cases # Errors % Error

    0 359 146 40.67

    1 509 235 46.17

    Overall 868 381 43.89

    Validation Set

    K-NN Best K: 5 # Input Variables : 14

    Class # Cases # Errors % Error

    0 905 112 12.38

    1 1267 613 48.38

    Overall 2172 725 33.38

    Class # Cases # Errors % Error

    0 524 148 28.24

    1 779 508 65.21

    Overall 1303 656 50.35

    Class # Cases # Errors % Error

    0 359 95 26.46

    1 509 313 61.49

    Overall 868 408 47.00

    Value of k % Error

    Training

    % Error

    Validation

    1 0.00 44.67

    2 23.94 49.12

    3 23.16 45.13

    4 27.53 47.51

    5 28.68 43.13

    6 30.66 47.12

    7 30.34 43.90

    8 31.72 45.89

    9 32.27 43.75

    10 32.97 44.74

    Probability Cutoff: 0.4

    Class # Cases # Errors % Error

    0 905 0 0.00

    1 1267 0 0.00

    Overall 2172 0 0.00

    Class # Cases # Errors % Error

    0 524 189 36.07

    1 779 321 41.21

    Overall 1303 510 39.14

    Class # Cases # Errors % Error

    0 359 125 34.82

    1 509 215 42.24

    Overall 868 340 39.17

    Logistic Regression - Stepwise Initial No. of Variables: 21 # Variables based on Cp: 20

    Class # Cases # Errors % Error

    0 905 386 42.65

    1 1267 444 35.04

    Overall 2172 830 38.21

    Class # Cases # Errors % Error

    0 524 222 42.37

    1 779 272 34.92

    Overall 1303 494 37.91

    Class # Cases # Errors % Error

    0 359 142 39.55

    1 509 198 38.90

    Overall 868 340 39.17

    Coefficient

    -47.1890297

    0.00544622

    1.11986947

    1.03916395

    -0.67768991

    0.1495695

    0.09804565

    0.07624547

    -0.05565267

    -0.00135054

    -0.04568335

    0.00852676

    0.00019505

    0.02833186

    0.00110265

    0.01369065

    0.00012139

    -0.00228719

    -0.00255478

    0.00014995

    Input variables

    Average Basket Quantity

    Constant term

    Age

    Sex_F

    Sex_M

    Enrollment Store_1001

    Enrollment Store_1002

    Marital Status_N

    Marital Status_Y

    Email_Y

    Average Basket Price

    Number of Baskets

    StdDev of Basket Price

    StdDev of Basket Quantity

    Last Transaction Date

    Last Purchase Unique Count

    Last Purchase Price

    Last Purchase Quantity

    Second Last Purchase

    Second Last Purchase Price

    We tried different # variables, but all had some over fitting

  • Model Evaluation (Test Data) • Naïve Rule is considered as the benchmark

    for evaluation with all customers tagged as “Brand Loyalist”

    • Key Metrics for Evaluation

    • Sensitivity (0  1)

    • All models are better than the benchmark

    • % Total Error

    • Logistic Regression and CART fare better than benchmark

    • Misclassification Costs

    • INR 120 for 0  1 (Customer Value)

    • INR 20 for 1  0 (Coupons and Samples)

    • All models fare better than benchmark

    • Holdout Evaluation - K-NN has some over fitting

    100.00%

    0.00%

    41.36%

    0.00%

    20.00%

    40.00%

    60.00%

    80.00%

    100.00%

    Sensitivity Specificity % Total Error

    Benchmark Logistic Regression Naïve Bayes K-NN CART

    61080

    21000 22220 17660 19300

    0

    20000

    40000

    60000

    80000

    Benchmark Logistic Regression

    Naïve Bayes K-NN CART

    Logistic Regression

    Class # Cases # Errors % Error

    0 190 86 45.26%

    1 310 102 32.90%

    Overall 500 188 37.60%

    Class # Cases # Errors % Error

    0 190 144 75.79%

    1 310 73 23.55%

    Overall 500 217 43.40%

    CART

  • • Deploy Logistic Regression Model for classifying new customers

    • Low Misclassification Costs

    • Similar Accuracy across all Data

    • Better Overall Error and Sensitivity

    • Easy Deployment of Model / Stable Model

    • Continuously improve the model by updating the classifications and adding more data

    • Include External Demographics data to improve the model

    • Expand model to include other products of MSB

    Recommendations

    Next Steps