Combining Logit and Ensemble Modeling for Increased Customer Churn Detection

#pawcon@pythongeert

Opening Keynote:

Le Mariage ParfaitCombining Logit and Ensemble Modeling for Increased Customer Churn Detection

PAW London Oct 29 – 30, 2014

Geert Verstraeten, PhDProgram ChairPAW London

#pawcon@pythongeert

prediction

#pawcon@pythongeert

our domain is about predicting the future, but we’re not the community that uses a crystal ball

#pawcon@pythongeert

#pawcon@pythongeert

prediction

instead, by knowing what lies behind us, we make an estimated guess about what’s ahead of us

#pawcon@pythongeert

#pawcon@pythongeert

prediction

example 1: predicting the stock marketkeynote john elderPAW london 2013

#pawcon@pythongeert

#pawcon@pythongeert

prediction

example 2: predicting player injuriesleicester tigers rugby teamPAW london 2012

#pawcon@pythongeert

#pawcon@pythongeert

prediction

example 3: predicting breakdown of sewerage network in walesPAW london 2012

#pawcon@pythongeert

#pawcon@pythongeert

prediction

in many cases, such as targeted marketing and credit scoring, decision makers are playing a game of guess who

predictive analytics can help decision makers win the game

#pawcon@pythongeert

big data

#pawcon@pythongeert

we add business value based on analysis of all kinds of data, including big data

#pawcon@pythongeert

technologytrigger

inflatedexpectations

disillusionment

plateau of productivity

#pawcon@pythongeert

hype cycle

while big data is at the peak of the hype cycle, predictive analytics is in the plateau of productivity

#pawcon@pythongeert

#pawcon@pythongeert

tools

there’s a lot of tools out there, i.e. algorithms that we use to predict -we’ll only explain the basic tools used in our study

#pawcon@pythongeert

#pawcon@pythongeert

people who listen to radiohead are claimed to be more intelligent than those who listen to beyoncé

#pawcon@pythongeert

#pawcon@pythongeert

SAT score

rankradiohead

musicthatmakesyoudumb.virgil.gr

MassachusettsInstitute of Technology

Harvard College

Yale UniversityUniversity of

Chicago

WellesleyCollege

University ofSouthernCalifornia

PrincetonUniversity

1

2

3

4

5

6

7

8

9

10

1300 1350 1400 1450 1500 1550 1600

linear regression explained using somedata from this study

#pawcon@pythongeert

#pawcon@pythongeert

linear regression

Y = 𝛼 + 𝛽 ∗ X

rank radiohead

Y = 𝛼 + 𝛽1 ∗ X1 + 𝛽2 ∗ X2 +...

SAT score

linear regression explained using somedata from this study

#pawcon@pythongeert

#pawcon@pythongeert

SAT score

rankradiohead


Harvard College

Yale UniversityUniversity of

Chicago

WellesleyCollege


PrincetonUniversity

1

2

3

4

5

6

7

8

9

10

1300 1350 1400 1450 1500 1550 1600

logistic regression explained using somedata from this study

#pawcon@pythongeert

#pawcon@pythongeert

SAT score

probabilityradioheadin top 3


Harvard College

Yale University

University ofChicago

WellesleyCollege


PrincetonUniversity

1

0

1300 1350 1400 1450 1500 1550 1600

logistic regression explained using somedata from this study

#pawcon@pythongeert

#pawcon@pythongeert

logistic regression

p = 1

1+𝑒−( )

Probability radiohead in top 3

SAT score

p = 1

1+𝑒−(𝛼+ 𝛽1∗X

1+ 𝛽

2∗X

2+...)

𝛼 + 𝛽 ∗ X

#pawcon@pythongeert

#pawcon@pythongeert

SAT score

# under-

graduates


Harvard College

Yale University

University ofChicago

WellesleyCollege


PrincetonUniversity

1300 1350 1400 1450 1500 1550 1600

18000

16000

14000

12000

10000

8000

6000

4000

2000

0

University of Pennsylvania

75%

50%

0%

decision trees explained using somedata from this study

#pawcon@pythongeert

#pawcon@pythongeert

decision tree

50%

25% 75%

0% 50%

SAT < 1,450 SAT >= 1,450

#ugrads < 9,000 #ugrads >= 9,000

decision trees explained using somedata from this study

#pawcon@pythongeert

#pawcon@pythongeert

multiplicitythis is multiplicity photography

#pawcon@pythongeert

#pawcon@pythongeert

multiplicity

in predictive analytics, multiplicity of good models means that several models often deliver comparable results

#pawcon@pythongeert

all models are wrong, but some are usefulGeorge Box

#pawcon@pythongeert

simplicity is theultimate sophistication

Leonardo Da Vinci

multiplicity

#pawcon@pythongeert

ensemble techniques

#pawcon@pythongeert

youtube.com/watch?v=gBnvGS4u3F0

explaining ensembles through a video with penguins and a whale

#pawcon@pythongeert

#pawcon@pythongeert

bagging

ID Target

1 1

2 0

3 1

4 0

5 0

ID Target

1

5 0

5 0

3 1

1 1

1 1

ID Target

2

2 0

4 0

1 1

4 0

1 1

explaining ensembles without a video with penguins and a whale

#pawcon@pythongeert

random forests

50%

25% 75%

0% 50%

#pawcon@pythongeert

random forests are an ensemble of trees

#pawcon@pythongeert

#pawcon@pythongeert

today

Véronique Van VlasselaerFraud Detection

Phil Winters

Internet of Things

Ajay Kelkar

Cross-selling

Arthur Von Eschen

Cheating Detection

Duncan Ross

Data for Good

the program of PAW london 2014wednesday oct 29

#pawcon@pythongeert

#pawcon@pythongeert

tomorrow

Dieter Debels

Buying flows

Lukas Vermeer

PersonalisedRecommendations

Gergely Kalmár

Online Behavior

Aaron Ferguson

Malicious Use of Anonymizers

Tom Khabaza

Time-seriesData Mining

the program of PAW london 2014thursday oct 30

#pawcon@pythongeert

#pawcon@pythongeert

keynotes

David Hand

The Improbability Principle

John Elder

The Peril of Vast Search

Dean Abbott

The Revolution in Retail Customer Intelligence

keynotesPAW london2014

#pawcon@pythongeert

Geert

@pythongeert

Verstraeten

pythonpredictions.com

pawcon.com/london

my personal mariage parfait

#pawcon@pythongeert

#pawcon@pythongeert

retention

retention is an important marketing problem for many organizations andindustries

#pawcon@pythongeert

#pawcon@pythongeert

predicting churn

churn prediction is a common application where the number of churners is typically small compared to the number of customers that stay

#pawcon@pythongeert

#pawcon@pythongeert

predicting churn

oversampling typically does not work great

#pawcon@pythongeert

#pawcon@pythongeert

predicting churn

mixed results for undersampling

#pawcon@pythongeert

#pawcon@pythongeert

our study

can we combine

undersamplingbagginglogistic regression

and improve logistic regression?

#pawcon@pythongeert

#pawcon@pythongeert

our studythe idea was to apply bagging, but in a way that we always keep the churners and only a proportion of non-churners

#pawcon@pythongeert

Kristof

@KCoussement

Coussement

ieseg.fr

pythonpredictions.comthe mariage parfait of kristofcoussement - professor of marketing in france - who performed the research

#pawcon@pythongeert

the prize

#pawcon@pythongeert

the prize needed to convince kristof to spend his time on this study – ‘a research bet’

#pawcon@pythongeert

#pawcon@pythongeert

11 real-life european churn datasets in retail, telco and financial services

variables 15 – 236 customers 876 – 74,122incidence 1.8% – 47%

our study

A Bagging-Based Undersampling Strategy for Classification: A Customer Churn Prediction Application – IFORS 2014 Barcelona

#pawcon@pythongeert

#pawcon@pythongeert

we compared• logistic regression• logit + bagging• logit + undersampling• logit + undersampling + bagging• random forests

our study


‘u-bagging’

#pawcon@pythongeert

#pawcon@pythongeert

• logistic regression with correlation-based feature selection filter

• bagging uses 100 samples• undersampling balanced sampling

• 5 x 2-fold cross-validation AUC• wilcoxon signed-rank test & friedmann test

with bonferroni-dunn post-hoc test

details


#pawcon@pythongeert

#pawcon@pythongeert

AUC

probability that a randomly selected

churner will be ranked higher than a randomly selected

non-churner

#pawcon@pythongeert

#pawcon@pythongeert

results

-0.005

0

0.005

0.01

0.015

0.02

0.025

1 2 3 4 5 6 7 8 9 10 11

bagging increasesperformance significantly when compared to logistic regression

Dataset

AUCincrease

#pawcon@pythongeert

#pawcon@pythongeert

resultsundersampling significantly decreases performancewhen compared to logistic regression

Dataset

AUCincrease

-0.025

-0.02

-0.015

-0.01

-0.005

0

0.005

1 2 3 4 5 6 7 8 9 10 11

#pawcon@pythongeert

#pawcon@pythongeert

results

Dataset

AUCincrease

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

1 2 3 4 5 6 7 8 9 10 11

u-bagging increasesperformance significantly when compared to logistic regression

#pawcon@pythongeert

#pawcon@pythongeert

resultsu-bagging is competitive to bagging

Dataset

AUCincrease

-0.015

-0.01

-0.005

0

0.005

0.01

1 2 3 4 5 6 7 8 9 10 11

#pawcon@pythongeert

#pawcon@pythongeert

resultsu-bagging is competitive to random forests

Dataset

AUCincrease

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

1 2 3 4 5 6 7 8 9 10 11

#pawcon@pythongeert

#pawcon@pythongeert

resultsu-bagging has top average ranking on the 11 real-life churn datasets

Average Ranking

1.6

2.2

3.4

3.5

4.4

1 2 3 4 5

u-bagging

bagging

logistic regression

random forests

undersampling

#pawcon@pythongeert

#pawcon@pythongeert

resultsu-bagging is much more efficient than bagging when incidence is low

Number of observations

100

10,000

10

1,000

logistic regression

bagging

undersampling (5% incidence)

u-bagging (5% incidence)

#pawcon@pythongeert

#pawcon@pythongeert

afterthoughts

#pawcon@pythongeert

#pawcon@pythongeert

a more efficient approach to bagging

undersampling vs oversampling

afterthoughts

multiplicity of good models

simplicity is the ultimate sophistication

parallel processing in big data

Data & Analytics

Combining Logit and Ensemble Modeling for Increased Customer Churn Detection