Click here to load reader
Upload
logan-moore
View
19
Download
0
Embed Size (px)
Citation preview
Marketing AnalyticsGroup 2: Logan Moore, Jennifer Eickert, Madeline Rynkiewicz, Lauryn Jashinski
Model Overview
Special Considerations
The main factors affecting the performance of our model was how we optimized the attributes selected and the parameters within the decision tree.
1. Weight by Gini Index: We ran six different weighting operators and Gini Index provided the most balanced results.
2. Select by Weight: This easily allowed us to choose the top 10 attributes to base the model off of.
3. Replace Missing Values: After trial and error, the best model was predicting after changing missing attribute values to averages.
4. Filter Examples: A rigorous process of examining the weights of each of the attribute, the mean and standard deviation of each attribute, and the overall effect of outliers on the model ensued to find the best prediction. (*Important note: ‘Custom Filters’ can only be applied in RapidMiner 6, downloaded on Logan’s personal computer)
Trial and Error Process for Filtering
eqpdays 1 eqpdays 1 eqpdays 1
months 0.644121 months 0.797907 months 0.790851
retcalls 0.45587 mou 0.248054 mou 0.232883
webcap 0.37495 retcalls 0.19691 retcalls 0.193697creditde 0.28094 webcap 0.160545 webcap 0.160489changem 0.266191 incalls 0.136913 incalls 0.137575changer 0.243359 creditde 0.120039 creditde 0.12025mou 0.156781 changem 0.110988 changem 0.111546
retaccpt 0.135313 changer 0.106593 outcalls 0.105707
phones 0.079818 outcalls 0.105258 unansvce 0.104361
Chi-Square Info Gain Gini
eqpdays 1 mou 1 retcalls 1
retcalls 0.623101 changem 0.494791 eqpdays 0.804925
webcap 0.59594 eqpdays 0.489207 webcap 0.684583
creditde 0.515741 revenue 0.084119 changer 0.60161mou 0.500321 changer 0.075769 creditde 0.498742incalls 0.372892 unansvce 0.072891 months 0.487047retaccpt 0.353905 outcalls 0.066249 changem 0.467689phones 0.334674 incalls 0.031656 retaccpt 0.326418
outcalls 0.331368 blckvce 0.02 mou 0.215435
changem 0.327312 months 0.018341 callwait 0.182593
Deviation UncertaintyCorrelation
eqpdays < 907
months < 44
mou < 1000
retcalls < 1webcap NA NAcreditde NA NAincalls < 42outcalls < 95
changem > -500 < 500
retaccpt < 1
changer < 204
MASTER FILTER @95%
5. Decision Tree: The Gini Index was used within the decision tree. This corresponds to the weighting measure by the Gini Index. Decision trees are the least restrictive of all models and do not assume normal distributions. This is especially useful since some attributes had shown that the distribution of their values was subject to skewnewss. A trial and error process was used to maximize the parameters (shown below).
Base Optimized
Model 1 Performance
Training
Validation
Scoring
Filters
This model has a solid performance because the ‘No’ validation is well above 40% and the ‘Yes’ Validation has a relatively high validation of 76.95%. The balanced prediction of ‘Yes’ and ‘No’ in the scoring data can be held with reasonable confidence for ‘Yes’. The validating model actually performs better than the training data, which is an anomaly, but does further indicate its solid all-around performance. Five filters were chosen that removed outliers of highly weighted attributes. This process adequately scrubbed the data. More research could be conducted into individual responses that contain outlier values, which may boost both ‘Yes’ and ‘No’ validation performances. This is a very rigorous process, even with adequate RapidMiner operators, which pervades the scope of this course.
Model 2 Performance
Training
Validation
Scoring
Filters
By simply removing ‘outcalls’ from the custom filter, the performance of the model drastically changed. This model can predict churn with 86% confidence, a 9% increase from the previous model. However, the retention prediction drops considerably (21%). Ultimately, the marginal gain in validation performance for churn skews this model and it appears that too many customers are predicted to will now churn.
Model 3 Performance
Training
Validation
Scoring
Filters
When only selecting the 6 most important attributes and filtering them for them accordingly, the performance validation of this model reflects the higher ‘yes’ of Model 2 and the higher ‘no’ of Model 1, in relation to whether or not a customer will churn. Once again, the validation performance of retention seems to be too low, where the prediction is too far out of balance.
Profile of Churning Customers
Customers that churn are expected to have fewer days with their equipment and less months than loyal customers. They will not place calls to the retention team or accept retention offers. They are also more likely to have lower/poor credit.