112

THESIS/DISSERTATION APPROVED BY - Creighton University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THESIS/DISSERTATION APPROVED BY - Creighton University
Page 2: THESIS/DISSERTATION APPROVED BY - Creighton University

THESIS/DISSERTATION APPROVED BY

Vivek Patil, Ph.D., Dissertation Chair

Sudhakar RauJj•

Digi tally slgned by Sudhakar Raju Date: 2019.i;l7.03 13:24:59 -05'00'

Sudhakar Raju, Ph.D., Committee Member

Matthew T. Seevers Matthew Seevers, Ph.D., Committee Member

Page 3: THESIS/DISSERTATION APPROVED BY - Creighton University

THE MANY TYPES OF CHURN AND THEIR PREDICTIVE MODELS

By

CHANDRASEKHAR (CHANDU) VALLURI

A DISSERTATION

Submitted to the faculty of the Graduate School of the Creighton University in Partial Fulfillment of the Requirements for the degree of Doctor of Business Administration

OMAHA, NE

August 12, 2019

Page 4: THESIS/DISSERTATION APPROVED BY - Creighton University

Copyright (2019) Chandrasekhar (Chandu) Valluri

This document is copyrighted material. Under copyright law, no part of this document

may be reproduced without the expressed permission of the author.

Page 5: THESIS/DISSERTATION APPROVED BY - Creighton University

iii | P a g e

Abstract

This dissertation consists of two chapters. Chapter 1 provides a detailed account of

customer, business and employee churn and how each of these churn types have been studied in

extant literature. Consequently, a more comprehensive definition of churn that is grounded on the

notion of various types of partnerships (B2C, B2B and B2E) is presented. Additionally, a churn

framework is developed after analyzing leading research in marketing and non-marketing journals.

This analysis reveals that churn can be attributed to a combination of both entity and non-entity

characteristics, although entity characteristics appear to be more common. Methodologically, churn

continues to be studied through various supervised learning methods in a variety of industries.

Finally, a proactive approach to churn management consisting of various propositions that link each

of the churn types to firm profitability is outlined. It is my hope that future churn research will

adopt this empirical based approach.

Chapter 2 specifically addresses the example of a customer character model as a determinant

of customer auto loan churn among a unique population of subprime borrowers. The banking

industry proxies for character include residential months, employment months, net worth, assets to

age and others. Ultimately, the customer character model is compared to a full model consisting of

the 4Cs of capacity, collateral, credit and character of churn prediction. The results reveal that there

is a difference between the full model and the restricted model. Furthermore, the various supervised

classification methods of logistic regression, linear discriminant analysis (LDA), decision trees (DTs)

and random forests (RF) are applied and compared in terms of multiple predictive performance

measures. The random forest classification measures report the strongest performance. Additionally,

the customer character variables of residential months reveal importance when conducting logistic

regression and net worth when conducting decision tree analysis. The random forest classification

method details the most important character variables to be net worth, followed by assets to age

Page 6: THESIS/DISSERTATION APPROVED BY - Creighton University

iv | P a g e

ratio, employment months and finally residential months.

Keywords: churn, business churn, employee churn, auto loans, 4C’s, classification techniques

Page 7: THESIS/DISSERTATION APPROVED BY - Creighton University

v | P a g e

Acknowledgements

I would like to firstly acknowledge Creighton University, for providing me with the unique

opportunity to pursue a DBA degree in a flexible format. It has been my ambition to undertake a

doctoral degree for some time and Creighton University has made this possible. I would also like to

extend my gratitude to the staff and faculty of the DBA program who have supported me

throughout this educational journey. My deepest gratitude goes to my fellow cohort members. I

have enjoyed learning with and have learnt a lot from you all. In particular, I would like to

acknowledge two of my closest cohort members. Dr. Keith Olson and Dr. Derek Kruse. I will

forever be grateful to have learnt and grown with you. I look forward to our continued friendship

and professional development together.

My sincere gratitude to my committee members: Dr. Vivek Patil, Dr. Sudhakar Raju and Dr.

Matthew Seevers. Each one of you have provided me invaluable mentorship over the duration of

this program and most importantly the dissertation phase. Dr. Patil, thank you for serving as my

dissertation chair, for patiently mentoring and motivating me when most needed. I am truly in awe

of all that you know and the humility with in which you continually seek to know. Dr. Sudhakar

Raju, thank you for introducing me to the field of data analytics, nurturing my competence in R and

for deepening my quantitative and technical abilities. It is because of you that I have come so far in

R. You are truly one of the most gifted and humble individuals I have ever met. Dr. Matt Seevers,

your detailed reviews and constructive feedback throughout the manuscript process have greatly

enhanced the quality of this work. It has been an immense pleasure to work with such a caring and

meticulous scholar like yourself. I look forward to continuing to learn and publish with you all.

I thank my most loving and encouraging wife Shantal who has made countless sacrifices

over the past three years in order to see me complete my studies. If it were not for Shantal, earning

Page 8: THESIS/DISSERTATION APPROVED BY - Creighton University

vi | P a g e

this degree would not have been possible. Without you, I would not have had the ability to persist.

Thank you for always believing in me and for making this a reality. You continually are my

strength and inspiration. To my loving son Sai and daughter Khushi, I hope Daddy’s

accomplishment will instill a life-long appreciation for education in the both of you. It is my hope

that you will relentlessly strive toward the achievement of your dreams throughout your lives.

I also want to thank my parents. As the son of an astrophysicist, I am named after the Noble

Laureate Subrahmanyan Chandrasekhar who is my father’s idol. Today, I am humbled to have

completed my Doctorate, and follow in the footsteps of my father, who is my idol. To my mother,

thank you for always encouraging me to follow my heart and pursue my dreams. You have always

instilled the meaning of faith, patience and perseverance in me. I would also like to acknowledge my

sister Silpa Valluri Gudur who has been my cheerleader and confidant since childhood. To the team

of Infuzn Foods, thank you for managing the business when I was not always able to be there. You

have all taught me the true meaning of family and of the importance of supporting one another

always.

Lastly, it is only with the blessing of my Guru, (spiritual teacher) Bhagawan Sri Sathya Sai

Baba that this endeavor could have been accomplished. My Guru always taught me that the “end of

education is good character” It is my every intention to utilize this great opportunity in a more

enriching manner to positively contribute to the betterment of society.

Page 9: THESIS/DISSERTATION APPROVED BY - Creighton University

vii | P a g e

Table of Contents Abstract .............................................................................................................................................................. iii

Acknowledgements ............................................................................................................................................ v

Table of Contents ............................................................................................................................................ vii

List of Abbreviations ....................................................................................................................................... xii

INTRODUCTION ......................................................................................................................................... 13

Chapter 1 ....................................................................................................................................................... 16

1.1 Customer churn ................................................................................................................................. 16

1.2 Business Churn .................................................................................................................................. 19

1.3 Employee Churn ............................................................................................................................... 22

1.4 Churn Today and Beyond ................................................................................................................ 25

1.5 Churn and Profitability ..................................................................................................................... 32

1.6 Implications and Contribution ........................................................................................................ 34

Chapter 2 ....................................................................................................................................................... 36

2.1 Introduction ....................................................................................................................................... 36

2.2 Churn and Banking ........................................................................................................................... 37

2.3 Subprime Auto Lending ................................................................................................................... 38

2.4 Financial Risks to Subprime Auto Lending ................................................................................... 39

2.5 The 4 C’s to lending .......................................................................................................................... 41

2.6 Dataset ................................................................................................................................................ 42

2.7 Modeling Techniques ....................................................................................................................... 43

Target Variable ..................................................................................................................................... 43

Quantitative and Qualitative (Class) Response Variables .............................................................. 43

Logistic Regression.............................................................................................................................. 45

Logistic Function ................................................................................................................................. 45

Linear Discriminant Analysis ............................................................................................................. 47

Decision Trees (DT............................................................................................................................. 48

Random Forests ................................................................................................................................... 50

2.8 Estimation Specifications ................................................................................................................. 51

2.9 Comparing Model Performance ..................................................................................................... 53

2.10 Variables of Study ........................................................................................................................... 58

2.11 Results ............................................................................................................................................... 61

Descriptive Statistics ........................................................................................................................... 61

Page 10: THESIS/DISSERTATION APPROVED BY - Creighton University

viii | P a g e

Histograms ........................................................................................................................................... 62

Correlations and Variance Inflation Factor Tests ........................................................................... 65

2.11 Logistic Regression and Confusion Matrices .......................................................................... 68

2.11 LDA and Confusion Matrices ................................................................................................... 73

2.11 Decision Trees and Confusion Matrices ................................................................................. 77

2.11 Random Forests and Confusion Matrices ............................................................................... 81 2.11 ROC Curves and AUC Values……………………………………………………….. 89

2.12 Discussion…………………………………………………………………………….91

2.13 Limitations and Future Research .............................................................................................. 94

2.14 Contribution ................................................................................................................................. 96 2.15 References……………………………………………………………………………. 98

Page 11: THESIS/DISSERTATION APPROVED BY - Creighton University

ix | P a g e

LIST OF ILLUSTRATIONS

Figures

Figure 1 Multifaceted Impact of Churn on Firm Profitability............................................................ 33

Figure 2.1 Anatomy of a Decision Tree ................................................................................................ 49

Figure 2.2 Residential Months .................................................................................................................. 63

Figure 2.3 Employment Months .............................................................................................................. 63

Figure 2.4 Assets to Age ...........................................................................................................................64

Figure 2.5 Net Worth… ........................................................................................................................... 64

Figure 2.6 FICO Score ............................................................................................................................. 65

Figure 2.7 Classification Tree for the Full Model (4Cs) ...................................................................... 79

Figure 2.8 Classification Tree for the Restricted Model (Character) ................................................. 80

Figure 2.9 Variable Importance Plot of the Random Forest (Full Model) ....................................... 84

Figure 2.10 Variable Importance Plot of the Random Forest (Restricted Model) .......................... 84

Figure 2.11 ROC Curves for Logistic Regression… .............................................................................89

Figure 2.12 ROC Curves for Linear Discriminant Analysis Models .................................................. 89

Figure 2.13 ROC Curves for Decision Tree Models ............................................................................ 90

Figure 2.14 ROC Curves for Random Forest Models .......................................................................... 90

Page 12: THESIS/DISSERTATION APPROVED BY - Creighton University

x | P a g e

Tables

Table 1.1 Churn Framework ..................................................................................................................... 27

Table 2.1 Advantages and Disadvantages of Each of the Classification Methods ............................ 51

Table 2.2 Elements of the Confusion Matrix ......................................................................................... 54

Table 2.3 Summary of Key Variables ........................................................................................................ 60

Table 2.4 Number of Churners and Non-Churners (Original Dataset) ............................................. 61

Table 2.5 Number of Churners and Non-Churners (Complete Dataset)............................................ 62

Table 2.6 Descriptive Statistics for Complete Dataset ........................................................................... 62

Table 2.7 Correlation Coefficients for Restricted Model ....................................................................... 66

Table 2.8 Correlation Coefficients for Full Model .................................................................................. 66

Table 2.9 Variance Inflation Factor Scores for Full Model (Complete Dataset) ................................ 67

Table 2.10 Variance Inflation Factor Scores for Restricted Model (Complete Dataset) ................... 67

Table 2.11 Number of Balanced Churners and Non-Churners by Dataset Type .............................. 68

Table 2.12 Restricted Model vs. Full Model (Testing of H1) ......................................................... 70

Table 2.13 Logistic Regression Results for Training Set ......................................................................... 71

Table 2.14 Logistic Regression Confusion Matrix for Full Model (Test Set) ..................................... 72

Table 2.15 Logistic Regression Confusion Matrix for Restricted Model (Test Set) ........................... 72

Table 2.16 Confusion Matrix Measures for Logistic Regression… ..................................................... 73

Table 2.17 LDA Functions (Full Model) on Training Set ..................................................................... 75

Page 13: THESIS/DISSERTATION APPROVED BY - Creighton University

xi | P a g e

Table 2.18 LDA Functions (Restricted Model) on Training Set ........................................................... 76

Table 2.19 Maximum LDA Fitted Score Difference on Training Set ................................................... 76

Table 2.20 Linear Discriminant Analysis Confusion Matrix for Full Model (Test Set) ...................... 76

Table 2.21 Linear Discriminant Analysis Confusion Matrix for Restricted Model (Test Set) ............ 77

Table 2.22 Confusion Matrix Measures for Linear Discriminant Analysis ........................................... 77

Table 2.23 Decision Tree Confusion Matrix for Full Model (Test Set) ................................................. 80

Table 2.24 Decision Tree Confusion Matrix for Restricted Model (Test Set) ....................................... 80

Table 2.25 Confusion Matrix Measures for Decision Trees ..................................................................... 81

Table 2.26 Random Forest Mean Decrease Gini Values (Full Model) on Training Set ....................... 83

Table 2.27 Random Forest Mean Decrease Gini Values (Restricted Model) on Training Set ........... 83

Table 2.28 Random Forest Confusion Matrix for Full Model (Test Set) ............................................... 85

Table 2.29 Random Forest Confusion Matrix for Restricted Model (Test Set) .................................... 85

Table 2.30 Confusion Matrix Measures for Random Forests .................................................................. 85

Table 2.31 Confusion Matrix Comparison Metrics for Various Models ................................................ 87

Table 2.32 Model Comparison with AUC Metrics .................................................................................... 88

Page 14: THESIS/DISSERTATION APPROVED BY - Creighton University

xii | P a g e

LIST OF ABBREVIATIONS

CRM-Customer Relationship Management

CLV-Customer Lifetime Value

B2C-Business to Consumer

B2B-Business to Business

B2E-Business to Employee

4Cs-Character, Capital, Capacity and Collateral

LR-Logistic Regression

LDA-Linear Discriminant Analysis

DT-Decision Tree

RF-Random Forest

NB-Naïve Bayes

KNN-K nearest neighbors

SVM-Support Vector Machine

LLM- Logit Leaf Model

SMS-Short Messaging Service

CART-Classification and Regression Tree

GAM-Generalized Additive Model

RSM-Random Subspace Method

MNP-Mobile Number Portability

AI-Artificial Intelligence

RFM-Recency, Frequency and Monetary

LRFMP-Length, Recency, Frequency, Monetary and Profit

LRFM-Longevity, Recency, Frequency and Monetary

CFRM-Product Categories, Frequency, Recency and Monetary

Bayesian Additive Regression Trees (BART)

Neural Networks (NN))

RFMTC-Recency, Frequency, Monetary, First Time Purchase and Customer Churn Probability

FMCG-Fast Moving Consumer Goods

ROC-Receiver Operating Curve

AUC-Area under Curve

EOR-Employee Organization Relationship

XGBoost-Extreme Gradient Boosting

HRIS-Human Resource Information System

RLV-Residual Lifetime Value

E-Entity

NE-Non-Entity

MAS-Money Attitude Scale

LTV-Loan to Value

MLE-Maximum Likelihood Estimation

LL-Log Likelihood

AIC-Akaike Information Criterion

ANOVA-Analysis of Variance

CHAID-Chi-Square Automatic Interaction Detection

TP-True Positive

TN-True Negative

FP-False Positive

FN-False Negative

PPV-Positive Prediction Value

OBB-Out of Bag Error

QDA-Quadratic Discriminant Analysis

MS-Mass spectrometry

Page 15: THESIS/DISSERTATION APPROVED BY - Creighton University

13 | P a g e

INTRODUCTION

A growing number of firms in a variety of industries seek to prevent customer churn on the

belief that customer retention is invaluable to increasing firm profitability (Van den Poel and

Lariviere, 2004). The ability to predict churn in organizational settings has become an increasingly

common customer relationship management (CRM) practice (Risselada et al., 2010) and is vital to

customer lifetime value (CLV) modeling (Neslin et al., 2006; Fader and Hardie, 2010; Braun and

Schweidel, 2011). Consequently, effective churn management contributes to building competitive

advantage (Garcia et al., 2017). The following dissertation seeks to provide a comprehensive survey

of this relational construct and its predictive modeling applications.

Chapter 1 describes the three major types of churn: customer, business, employee and how

each of these forms of churn have been studied in extant literature. Subsequently, the related

constructs of turnover, attrition and defection are also briefly discussed. By differentiating between

the various forms of churn within an organization, I propose a more comprehensive definition of

churn that is grounded on the notion of various types of partnerships (B2C, B2B and B2E)

according to the type of churn transpiring. The presented churn framework reveals that the churn

phenomena can be attributed to a combination of both entity and non-entity characteristics,

although entity characteristics appear to be more common than non-entity factors when conducting

churn analysis. Methodologically, the framework demonstrates that churn continues to be studied

through various supervised learning methods in a variety of industries. The churn concept is also

researched in a variety of disciplines that include marketing, management, computer science and

operations research. Finally, a proactive approach to churn management consisting of various

Page 16: THESIS/DISSERTATION APPROVED BY - Creighton University

14 | P a g e

propositions that link each of the churn types to firm profitability is outlined. It is my hope that

future churn research adopt this empirical approach.

Chapter 2 specifically addresses the example of customer character as an entity determinant of

customer auto loan churn among a unique sample of subprime borrowers. This chapter compares

the impact of both entity and non-entity variables on auto loan churn. The study of auto loan churn

is integral as over 107 million Americans in 2017 incurred auto loan debt compared to roughly 80

million in 2012 (Federal Reserve Bank of New York, 2016). Furthermore, after the great financial

crisis, U.S. auto loans now exceed home mortgages (Long, 2017). This chapter develops a full

model of customer auto loan churn that consists of the 4C’s of character, capacity, collateral and

credit and compares it to the restricted model of customer character to a common set of control

variables. More specifically, I assess the impact of the various customer character variables on auto

loan churn. The results indicate that the restricted model is indeed different from the full model.

The customer character variables of residential months reveal significance when conducting logistic

regression and net worth when conducting decision tree analysis.

The random forest classification method details the most important character variables to be

net worth, followed by the assets to age ratio, employment months and finally residential months.

Therefore, the initial findings of this paper guide lenders to continue to carefully assess customer

character when issuing loans to subprime borrowers. From a practical perspective, continued efforts

toward effective character screening is recommended in order to determine more accurate customer

profiles for the purposes of target marketing and customer retention. This study also deepens

understanding of subprime credit markets and the role of both entity and non-entity variables.

Page 17: THESIS/DISSERTATION APPROVED BY - Creighton University

15 | P a g e

The supervised machine learning classification techniques of logistic regression, linear

discriminant analysis, decision trees and random forests are applied to test predictive performance.

Each of these supervised learning methods are compared on multiple performance measures.

Collectively, the results indicate that the random forest classification measures report the highest

accuracy (lowest error rate), recall, precision and largest Area under the curve (AUC) measures.

Lastly, the use of multiple classification techniques to aid in more efficient churn prediction as

advocated in the churn literature is novel to this domain of predictive research (Verbeke et al.,

2011).

Page 18: THESIS/DISSERTATION APPROVED BY - Creighton University

16 | P a g e

CHAPTER 1

1.1 Customer churn

Customer churn (i.e. commonly termed as churn) is the notion that customers can refrain

from doing business with a provider by discontinuing purchases of the good or service provided by

the firm (Gordini and Veglio, 2017; Tamaddoni et al., 2016; Knox and Van Oest, 2014; Sharma and

Panigrahi, 2011). Hence, customer churn is akin to the concepts of voluntary turnover, attrition and

defection (Dawson et al., 2014; Clemente-Ciscar et al., 2014; Kamakura, 2005). Additionally, variants

to the churn definition can exist by industry and company (Clemente-Ciscar et al., 2014; Glady et al.,

2009; Buckinx and Van den Poel, 2005). Churn commonly arises in transactional dealings that can

either be contractual (Ascarza and Hardie, 2013; Fader and Hardie, 2010) or non-contractual

(Clemente-Ciscar et al., 2014; Buckinx and Van den Poel, 2005), although contractual churn has

been more frequently studied (Jahromi et al., 2014). Subsequently, the churn rate is a measure of

churn over the total customer base at specific time periods (Mozer et al., 2000). Churn rates are

known to vary by industry and company (McDonald, 2010) and can be measured at different time

instances according to standard industry practices (Ascarza et al., 2018). Therefore, churn rates are

regularly monitored and churn management involves the process of retaining customers (Sharma

and Panigrahi, 2011) by employing preventive techniques to combat customer attrition, customer

turnover or customer defection. By ranking customers based on their probabilities of leaving the

organization, the highest ranking probability scores serve as targets for incentivized retention

campaigns (Coussement and De Bock, 2013; Gordini and Veglio, 2017).

A growing number of firms in a variety of industries seek to prevent customer churn on the

belief that customer retention is invaluable to increasing firm profitability (Van den Poel and

Lariviere, 2004). Thus, customer churn is an integral component to customer relationship

management (CRM) (Risselada et al., 2010) as well as to customer lifetime value (CLV) modeling

Page 19: THESIS/DISSERTATION APPROVED BY - Creighton University

17 | P a g e

(Neslin et al., 2006; Braun and Schweidel, 2011). From an organizational perspective, customers are

often motivated to churn due to both controllable and uncontrollable factors (Braun and

Schweidel, 2011). Controllable factors include quality concerns, price changes, privacy concerns,

attractive competitor offerings, low switching costs, general customer dissatisfaction and many

other reasons (Tamaddoni et al., 2014; Braun and Schweidel, 2011). Braun and Schweidel (2011)

argue that uncontrollable factors may be triggered by various changes in customer circumstances

like personal lifestyle changes, family challenges, job relocation, economic changes, death and many

others. However, in general the ability to prevent controllable churn is critical to customer

retention and is far more advantageous than costly new customer acquisitions (Sharma and

Panigrahi, 2011; Keramati et al., 2016; Ganesh et al., 2000).

Correspondingly, good customer retention is known to prevent negative word of mouth

advertising (Ganesh et al., 2000), declining sales (Rust and Zahorik, 1993) and can contribute

towards firm profits (Van den Poel and Lariviere, 2004). Moreover, the ability to identify churners

enables companies to offer special incentives and apply specific customer retention strategies

(Jahromi et al., 2014). Accordingly, effective churn management is an integral component to helping

ensure competitive advantage (Garcia et al., 2017). Consequently, churn analysis has been applied in

a variety of industries that include: telecommunications (Amin et al., 2018; Zhao et al., 2017; Tsai

and Lu, 2009), finance (Van den Poel and Lariviere, 2004), cellular networks (Sharma and Panigrahi,

2011), online gaming (Suh and Alhaery, 2016), nursing (Dawson et al., 2014; Hughes et al., 2015),

retail (Yu et al., 2011; Buckinx and Van den Poel, 2005), sporting (McDonald, 2010) and many

others due to its importance to marketing practice.

Burez and Van den Poel (2007) document that churn management has largely been practiced

from two perspectives. Organizations can either employ a reactive (untargeted) or proactive

Page 20: THESIS/DISSERTATION APPROVED BY - Creighton University

18 | P a g e

(targeted) approach to churn management. The reactive approach entails the customer approaching

the firm about wanting to forfeit usage of the good or service and often involves firms reacting by

offering an incentive to retain the customer’s business. Proactive approaches to churn management

involve organizations using predictive modeling techniques that identify prospective churners in

advance. A growing number of organizations are initiating proactive approaches to anticipating

customer churn as it is critical to firm performance and survival.

Essentially, two major streams of customer churn research have emerged in the literature.

The first focal area of churn research has examined various causes of churn. For example, Braun and

Schweidel (2011) determine a series of eight controllable and two uncontrollable drivers to

telecommunications churn and study the impact of these factors on CLV. Sulaimon et al. (2016) also

identified various drivers of customer churn in the Nigerian telecommunications industry. More

specifically, they found that dubious promotions, weak cellular coverage, unwanted calls and short

messaging services (SMS), inefficient internet service, poor message delivery and complexity of

mobile number portability (MNP) all increased the probability of churn. Furthermore, Coussement

and De Bock (2013) examined the effects of sixty different churn predictors (both behavioral and

demographic variables) among gamblers using multiple data analysis methods. Similar studies

investigating the drivers or causes of churn have been conducted in a variety of industries (Dawson

et al., 2014; McDonald, 2010; Rowley and Purcell, 2001)

The second stream of research has focused on developing newer, more sophisticated churn

modeling techniques (including hybrid techniques) and utilizing a combination of different

techniques to aid in more efficient churn prediction (Verbeke et al., 2011). Principally, predictive

churn models are an example of binary classification (Coussement et al., 2008) and are referred to as

supervised learning (i.e. directed) classification models. Common examples of supervised learning

Page 21: THESIS/DISSERTATION APPROVED BY - Creighton University

19 | P a g e

models that have been applied to churn research include: naïve Bayes (NB), logistic regression (LR),

k-nearest neighbors (KNN), discriminant analysis, neural networks (NN), survival analysis, decision

trees (DTs), support vector machines (SVM), random forests (RFs), Markov chains, the hybrid logit

leaf model (LLM) and the ensemble learning methods of bagging and boosting (Verbeke et al.,

2011; Sharma and Panigrahi, 2011; Tamaddoni et al., 2014; Zhao et al., 2017).

A comprehensive review of extant literature reveals that two dominant families of supervised

learning classification techniques have emerged in predictive churn modeling: 1) logistic regression

and 2) decision trees (De Caigny et al., 2018). Verbeke et al. (2011) showed that both these algorithm

forms provide the advantages of predictive power and comprehensibility (De Caigny et al., 2018).

An increasing number of churn prediction techniques relate to the artificial intelligence (AI)

discipline as well (Risselada et al., 2010). Many more studies cite the usage of combination

techniques and hybrid techniques for the purposes of benchmarking and accuracy comparisons (De

Caigny et al., 2018)

1.2 Business Churn

Business churn is an extension of customer churn in that the customer is another business.

Limited empirical churn research has been conducted in the B2B context (Tamaddoni et al., 2014)

due to the limited collection of big data in this arena versus the B2C setting (Wiersema, 2013). Even

in instances where B2B data has been available, the ability to transform this data for the purposes of

analysis can be difficult (Tamaddoni et al., 2014). Therefore, a greater understanding of the

implications of B2B churn is critical as business customers are noted to make larger transactions

more frequently and may not be as sizeable in numbers compared to customers in the B2C context.

Thus, business attrition can be a significant contributor to financial loss for a company (Tamaddoni

et al., 2014) and business retention efforts in the B2B context are increasingly important due to

increased online and global competition (Waxer, 2011).

Page 22: THESIS/DISSERTATION APPROVED BY - Creighton University

20 | P a g e

Of the limited number of churn studies in the B2B context, most have largely focused on

understanding the implications of B2B churn on predicting customer profitability or for the

purposes of resource allocation (Tamaddoni et al., 2014). For example, Rust et al. (2011) proposed a

simulation model to predict business customer profitability by utilizing archival firm data. In this

study, a technology firm sells various technological products to other businesses (business

customers). Business customer profitability results from calculating the net present value and net

profit of each business customer. Profitability is modeled on the basis of propensity to spend, gross

profit, predicted number of marketing contacts and marketing cost per customer. Business

characteristics (number of employees, industry category), past marketing contacts, past purchase

behaviors and various control variables (state of the economy) all influence the propensity to spend

and gross profit. Ultimately, Rust et al. (2011) were able to show that their predictive customer

profitability model had more predictive strength when compared to naive models.

Extending this thought process, Tamaddoni et al. (2014) model business churn in a non-

contractual setting. The authors obtained transactional business customer data from an Australian

online retailer who sold everyday routine products to other businesses. The predictor variables in the

model consisted of recency (calculation of when items were most recently purchased within a certain

time frame) and frequency (total number of purchases within a given time frame) as well as

monetary variables (changes in total business customer spending during a particular time period).

These recency, frequency and monetary (RFM) predictor variables were identified as appropriate

variables to model business churn based on previous studies and have been used as appropriate

measures to model customer value (Coussement and De Bock; 2013). After constructing a training

model that consisted of logistic regression, various types of decision trees and boosting techniques,

the training model was applied to the test set. Ultimately, both recency and frequency were

important predictors of B2B churn. Less frequent purchases and longer purchase durations

Page 23: THESIS/DISSERTATION APPROVED BY - Creighton University

21 | P a g e

contributed to the probability of churning. Additionally, changes in total spending amounts

revealed less impact on churn. The authors also employed cumulative lift charts and found that

boosting models performed the best in isolating business churners.

Chen et al. (2015) note that the predictors of the RFM customer value model vary by

industry and mention other adaptations of the RFM model based on the uniqueness of the business.

The measures of length, recency, frequency, monetary and profit (LRFMP) to model logistic

business churn have also been used (Chen et al., 2015). Other variations include: relationship

longevity, recency, frequency and monetary variables (LRFM), product categories, frequency,

recency and monetary (CFRM) and the RFMTC (recency, frequency, monetary, first time purchase

and customer churn probability). In essence, using unique measures of RFM or its variants as they

relate to a specific industry assist in developing models of business churn (Chen et al., 2015)

Most recently, Gordini and Veglio (2017) model B2B churn in an Italian e-commerce fast

moving consumer goods (FMCG) company with 80,000 customers by employing the technique of

support vector machine (SVMauc) and comparing the predictive accuracy of this method to

traditional approaches like logistic regression and neural networks. The authors argue that in the

B2B context, few churn prediction models exist. Of those that do exist, conflicting results have

ensued and more intentional model design that is specific to the B2B context is required. Ultimately,

the authors find predictive superiority in the SVMauc model in comparison to the other methods and

concur with the predictive merits of this method as found in other studies examining high churn

industries (Coussement and Van den Poel, 2008). Hence, the authors suggest that this method may

be versatile enough to predict B2B churn in any industry.

Page 24: THESIS/DISSERTATION APPROVED BY - Creighton University

22 | P a g e

1.3 Employee Churn

Formally, very little has been written about employee churn (Strohmeier and Piazza, 2012).

Essentially, the concept of employee churn is equivalent to voluntary employee turnover or

employee attrition (Saradhi and Palshikar, 2011). Employee turnover and attrition have been studied

extensively in the management literature (Cotton and Tuttle, 1986; Griffeth and Hom, 1995;

Griffeth et al., 2000; Hom et al., 2017). Saradhi and Palshikar (2011) argue that employee churn is

similar but distinct from customer churn. Much like B2C and B2B churn, employee churn also poses

significant challenges for a firm. For example, employee attrition is known to be financially costly to

a firm. It results in a drop in labor productivity for the affected organization, can lead to customer

attrition thereby requiring additional customer retention investments and an assortment of

unanticipated additional tasks that can be time intensive and strenuous (Duda and Zurkova, 2013).

Mobley (1977) specifically modeled how job dissatisfaction can lead to employee turnover

and depicted the various steps involved in this process. He identified a series of determinants of why

employees quit their jobs which include: employee characteristics, organizational factors, job

expectations, labor market influences and employee values (Jha, 2009). Likewise, a meta-analysis by

Cotton and Tuttle (1986) identified a total of twenty-six variable correlates to employee turnover

after analyzing one hundred and twenty academic papers on the subject. Cotton and Tuttle

categorized these variables as external, work related and personal correlates. Examples of external

correlates include employment opinions, accession rates, union presence and the unemployment

rate. Work correlates entail worker compensation (pay), performance, multiple measures of

satisfaction, organizational commitment, role clarity and task repetitiveness. Finally, common

personal correlates include employee age, length of tenure with an employer, gender, behavioral

intentions, intelligence and other measures. Along similar lines Udechukwu and Mujtaba (2007)

identified two broad categories contributing to voluntary employee turnover. These two categories

Page 25: THESIS/DISSERTATION APPROVED BY - Creighton University

23 | P a g e

are: internal (employer) and external (social affiliate) factors relating to individual employees. Jha

(2009) identified both individual and organizational factors of turnover intentions. Individual factors

include: employee personality, employee skills and abilities, organizational justice, cognitive and non-

cognitive factors, ethnicity, age and other variables. Common organizational factors entail job stress,

organizational commitment, organizational culture, organizational support, supervisor gender, social

comparison and others (Jha, 2009).

Most recently, Hom and Griffeth (2017) examined the employee turnover literature over the

past one hundred years and summarized the antecedents to employee turnover being due to the

relationship quality between an employee and organization. More specifically, the researchers

identify the role of offered inducements (i.e. salary, training, benefits, job security and others) and

expected contributions (performance requirements, organizational commitment, organizational

commitment) as key determinants of the relationship. This extensive review listed the various causes

of employee turnover being attributed to job features, job dissatisfaction, labor market,

professional/occupational and family pressure variables among others. Again, it should be noted

that certain variables will be more or less influential in predicting employee turnover based on the

industry that is being studied.

Machine learning algorithms are more prominently being used in human resource

management research (Saradhi and Palshikar; 2011; Strohmeier and Piazza, 2012; Punnoose and Ajit,

2016). Saradhi and Palshikar (2011) are among a select few researchers who have formally studied

employee churn. The researchers developed an employee churn model utilizing a support vector

machine classification technique on employer work history data. The authors developed three

separate models to study employee churn. In the first model they factored twelve out of twenty-five

possible worker attribute variables (e.g. past experience in years, designation level, experience in

parent organization,

Page 26: THESIS/DISSERTATION APPROVED BY - Creighton University

24 | P a g e

department, employee location, age, gender, marital status, qualification, experience in client

organization, client designation, on-site/off-site worker and billed versus not-billed compensation).

The second and third models included twenty derived variables (number of promotions in parent

organization, promotions in client organization, number of billed/unbilled months, time spent on

and offsite, number of transfers and others.) After separating the data into training and test sets, and

comparing the accuracy rates from support vector machine analysis, random forests and basic

Bayesian models, the authors found that support vector machine classification was the most

accurate in predicting employee churn. Thus, the authors were able to show the legitimacy of

employing kernel based churn prediction methods like support vector machine analysis. Moreover,

Saradhi and Palshikar (2011) were also able to develop an employee value model on resigned,

released and retained employees. In doing so, they extended the notion of the customer value model

to an employee value model based on their ability to predict employee churn. However, their

employee value model requires empirical testing (Saradhi and Palshikar, 2011).

Strohmeier and Piazza (2012) conducted a comprehensive secondary research review of

datamining techniques specific to the four functional domains of human resource management

which entail staffing, development, compensation and performance management from 1991-2011.

The authors’ final analysis of one hundred articles found that sixty-seven of the articles concerned

staffing of which twenty one of these were purely concerned with employee turnover and retention

based studies. Of all the turnover studies, the researchers found that the classification methods of

decision trees, discriminant analysis, support vector machines and neural networks were most

frequently used.

Along similar lines, Punnoose and Ajit (2016) compared the performance of a total of seven

supervised learning employee turnover models: logistic regression, naïve bayesian, random forest, k-

nearest neighbor, linear discriminant analysis, support vector machine and

Page 27: THESIS/DISSERTATION APPROVED BY - Creighton University

25 | P a g e

extreme gradient boosting on the Human Resource Information System (HRIS) data of the

leadership team of a global retailer over an eighteen month timeline. The researchers found that

extreme gradient boosting was the most superior of the supervised learning methods employed in

predicting employee turnover.

Correspondingly, Yiget and Shourabizadeh (2017) applied the classification techniques of

decision trees, logistic regression, support vector machines, random forest, k-nearest neighbor and

naïve bayes on employee attrition data provided by IBM. Each of the supervised learning methods

were compared on the basis of multiple confusion matrix measures like accuracy, precision, recall

and F- measures. The researchers found that the support vector machine method exhibited superior

performance in terms of accuracy, precision and the F measure. Additionally, a series of

dissertations at both the masters and doctoral level have also been written (Zehra, 2014; Attri, 2018)

that predict employee churn. Table 1.1. highlights other studies that have utilized machine learning

methods to predict employee turnover.

1.4 Churn Today and Beyond

A number of studies have sought to synthesize existing work in this field (Chen et al., 2015;

Sharma and Panigrahi, 2011; De Caigny et al., 2018). Most recently, work by De Caigny et al. (2018)

referenced a series of studies portraying different churn prediction models from 2012-2017 as

published in the following non-marketing journals: European Journal of Operations Research, Expert

Systems with Applications, Decision Support Systems and Journal of Business Research. Their paper focused on

summarizing the methodological approaches to each study. This paper extends the authors work by

creating a churn framework that includes De Caigny’s et al.(2018) churn literature review as well as

articles published in a select sample of top tier marketing journals from 2012-2017 according to the

latest report from Scimago Journal Ranking (2017). The specific journals referenced include: (Journal

of Marketing, Journal of Marketing Research, Marketing Science, Marketing Letters, Quantitative Marketing and

Page 28: THESIS/DISSERTATION APPROVED BY - Creighton University

26 | P a g e

Economics and Industrial Marketing Management) due to their frequent publication of churn articles and

their largely empirical focus. Each of the extracted articles that were generated from the previously

mentioned journals were obtained after searching under the key words of churn and business churn.

Moreover, the ensuing churn framework also includes articles written during the 2012-2017 time

horizon that explicitly investigated the key search terms employee churn/employee turnover and predictive

analytics. The employee churn articles for analysis were obtained through Google Scholar, ABI

Inform and EBSCO database searches. It is important to note that De Caigny’s et al. (2018) review

emphasized the classification methods used, performance evaluation (confusion matrix, AUC, lift

charts, calibration plots) and the industries studied.

Table 1.1 identifies the authors, year of publication, journal title, industry context, type of

churn investigated, summary of key variables explored, and the predictive techniques utilized. The

proposed framework extends existing work by grouping existing studies by churn type and

categorizing the various churn drivers into either entity (i.e. customer attributes) or non-entity (i.e.

non customer based attributes that include macroeconomic, product/service attributes and

competitive offerings) based characteristics across industries. Furthermore, each of the

methodological techniques used in these works are also listed along with the industries that were

analyzed.

Previously, Braun and Schweidel (2011) classified churn determinants as either being due to

controllable or uncontrollable factors. On a similar note, Coussement and De Bock (2013) examined

the effects of sixty different predictor variables and classified churn determinants as either

behavioral or demographic variables. This framework integrates both these viewpoints by

recognizing that churn behavior can be the result of both individual as well as non- individual

factors (external correlates). More precisely, the framework relies on the entity set model view of

data (Chen, 1976; Kanjilal, 2008) that classifies an entity as an object that is distinct, easily

Page 29: THESIS/DISSERTATION APPROVED BY - Creighton University

27 | P a g e

identifiable and contains information that can be utilized for modeling purposes. Common examples

of entities include individuals, consumers, employees, businesses and events. Consequently, non-

entities are classified as non-objects and often include macroeconomic variables. For the purposes

of this dissertation, I specifically regard entity variables as end consumer, business customer and

employee characteristics and non-entity variables to include product/service details, macroeconomic

variables, customer firm interactions and competitive offerings. The contents of Table 1.1 are

detailed below.

Table 1.1 Churn Framework

Author(s) and Date of Publication (DOP)

Journal Type and Title M= Marketing O = Other

Churn Context (Industry)

Churn Type (Customer, Business or Employee)

Variables Studied Categorization of Churn Drivers (E= Entity characteristic , NE = Non- entity characteristic

Predictive Modeling Techniques

Coussement and De Bock (2013)

O (Journal of Business Research)

Gambling Customer Customer Demographic and Behavioral characteristics

E decision trees (CART) Random Forests (RFs) Generalized Additive Models (GAM), ensemble learners (Bagging and Random Subspace Method (RSM)

Coussement et al. (2017)

O (Decision Support Systems)

Telecom Customer 956 churn drivers, a combination of Customer behavior, customer company interaction variables, subscription details and Customer demographics

E and NE Logistic Regression, decision trees (CHAID) Bagging, Boosting, Ensemble learners, naïve Bayes, neural networks and SVM

Moeyersoms and Martens (2015)

O (Decision Support Systems)

Energy Customer Socio demographic and consumption variables

E Decision Trees (C4.5), logistic regression, SVM

Kumar et al. (2018)

M (Journal of Marketing Research)

Telecom Customer Customer Behavioral activities and firm marketing activities

E and NE Largely based on survival analysis method

Tamaddoni et al. (2014)

M (Industrial Marketing Management)

Online retail Business Customer Value Model (RFM predictors)

E Logistic regression, decision trees (CART ), Ensemble (Boosting)

Page 30: THESIS/DISSERTATION APPROVED BY - Creighton University

28 | P a g e

Gordini and Veglio (2017)

M (Industrial Marketing Management)

Online Retail Business Customer Value Model (RFM predictors) Customer data points

E Logistic regression, Neural network, SVM, SVMauc

Chen et al. (2015)

O (Information Systems E- Business Management)

Logistics Business Customer Value Model (LRFMP)

E Logistic regression, Decision tree, SVM, Artificial Neural Network

Huang et al. (2012)

O (Expert Systems with Applications)

Telecom Customer Demographic, grant support, customer account info, Henley segmentation variables, service and telephone line info, historical info, telephone and complaint info

E and NE Logistic Regression, Linear Classifiers, Naive Bayes, Decision Trees (C4.5), Multilayer Perceptron Neural Networks, SVM and the Evolutionary Data Mining Algorithm

Tang et al. (2014)

O (European Journal of Operations Research)

Financial Services

Customer Socio-demographic, Macroeconomic

E and NE Probit hazard rate model with orthogonal polynomial approximation

De Bock and Van den Poel (2012)

O (Expert Systems with Applications)

6 data sets that include: 2 Supermarket chains, bank, DIY supplies, Telecom, Mail order garments

Customer Customer demographic, Historical information, Financial information

E GAM, GAM ensembles, bagging, Random Forests, Logistic regression, Random Subspace Method (RSM)

Verbeke et al. (2012)

O (European Journal of Operations Research)

11 Telecom Datasets

Customer Socio demographic data, Call behavior data, Financial info, Marketing related variables

E and NE Logistic Regression, Decision trees, logistic model tree, bagging, boosting, Random Forests, KNN, Neural Networks, rule induction techniques, Naive Bayes, Bayesian Networks, SVM

Ballings and Van den Poel (2012)

O (Expert Systems with Applications)

Newspaper Industry

Customer Customer variables (Sociodemographic) Relationship Variables (RFM, Length of Relationship (LOR)

E and NE Logistic regression, classification trees, bagging

Chen et al. (2012)

O (European Journal of Operational Research)

3 datasets Food Telecom Adventure

Customer and business

Customer Variables Transactional details Longitudinal Behavioral details

E logistic regression, proportional hazard model SVM, neural networks, decision tree, RFs boosting

Ascarza et al. (2016)

M Telecom Customer Behavioral Details

E Probit Analysis

Page 31: THESIS/DISSERTATION APPROVED BY - Creighton University

29 | P a g e

(Journal of Marketing Research)

Giudicati et al. M Health Club Customer Socialization, E and NE Probit and (2013) (Marketing Contract tenure, duration

Letters) Usage, estimation Demographics, techniques Coexperience network

Knox and Oest M Online Internet Customer Focus is on non- E Residual (2014) (Journal of Retailer transactional data Lifetime Value

Marketing) (prior complaints, (RLV) recoveries, prior customer purchases)

Ma et al. (2015)

M (Marketing Letters)

Telecom Business Trial and repurchase data

E and NE The Pareto/ negative binomial distribution (NBD) model

Markov Chain Monte Carlo Simulation

Becker et al. M Internet Customer Behavioral, E and NE Hazard model (2014) (Marketing Services psychometric, and

Letters) advertising data points of customers along with competitive firm info

Balachander and Ghosh (2013)

M

(Quantitative Marketing and Economics)

Telecom/Wirele ss Services

Customer

Demographic and behavioral, multiple product purchase information by customer

E and NE Hierarchical Bayesian methods

Hom and Griffeth (2017)

O (Journal of Applied Psychology)

*This paper was a non-empirical study

N/A Employee N/A N/A Ordinary Least Square Regression (OLS), survival analysis, hazard function analysis, Structural Equation Modeling (SEM), Cox regression, call for big data methods

Strohemier and Piazza (2012)

O (Expert Systems with Applications)

*This paper was a non-empirical study

N/A Employee N/A N/A Decision Trees, SVMs, Cluster analysis, Associative analysis, Neural networks, Discriminant analysis, Time series, Logistic regression, Multiple linear regression, Rough set, Naïve Bayes, Multidimensional scaling and Learning preference models

Punnoose and Ajit (2016)

O Retail Employee Demographic, E and NE Logistic regression,

Page 32: THESIS/DISSERTATION APPROVED BY - Creighton University

30 | P a g e

(International Compensation Naïve Bayesian, Journal of details, Random forest, Advanced Unemployment rate, K-nearest Artificial household income neighbor (KNN), Intelligence) data Linear

discriminant analysis (LDA), Support vector machine (SVM) and Extreme Gradient Boosting (XGBoost)

Yiget and O IBM internal Employee Demographic, skills, E Decision trees, Shourabizadeh (2017 data experience, nature of Logistic (2017) International work Regression,

Artificial Support vector Intelligence and machines, Data Processing Random forest, Symposium K-nearest (IDAP) neighbor (KNN) and Naïve Bayes

Alao and O Higher Employee Demographic and job E Decision Tree Adeyemo (Computing, Education related variables Methods of (2013) Information CHAID, C4.5

Systems & and REPtree Development Informatics)

Table 1.1 reinforces much of what the literature has already documented. The churn

phenomena continues to be studied in traditional industries like telecom (Verbeke et al., 2012;

Huang et al., 2012) as well as banking and finance (Tang et al., 2014) However, churn analysis is also

being applied to non-traditional industries like gambling (Coussement and De Bock, 2013) online

retail (Tamaddoni et al., 2014), health clubs (Giudicati et al., 2013), energy (Moeyersoms and

Martens, 2015) and adventure (Chen et al., 2012). Moreover, there is also evidence to suggest that

certain research studies are examining multiple industries as part of the same study (Chen et al.,

2012; De Bock and Van den Poel, 2012; Ascarza, 2018). Amongst the three major types of churn,

customer churn still remains the dominant form of churn. Only five of the surveyed papers

examined the topic of business churn during the 2012-2017 time frame. Furthermore, the topic of

employee churn has also been minimally investigated and is housed in non- marketing publications.

Thus, these findings suggest that the topics of business and employee churn warrant further research

exploration. However, it is important to note that Chen et al. (2012) planted the initial seed to study

Page 33: THESIS/DISSERTATION APPROVED BY - Creighton University

31 | P a g e

both customer and business churn together. Additionally, of the 24 churn prediction studies

published, a growing number of these papers appeared in non-marketing journals, demonstrating the

high applicability of the churn construct to multiple disciplines. The literature analysis also reveals

that the bulk of studies focus on entity determinants (i.e. predominantly some form of customer

characteristic variables). The most common entity variables include socio demographic and

behavioral characteristics. With respect to non-entity determinants, product and customer firm

interactions appear to be the most visible. The recent work of Ascarza both in 2016 and 2018

suggest that experimental designs are also being incorporated into churn analysis in addition to just

pure archival analysis. Finally, with reference to modeling methods, logistic regression and decision

trees remain the most popular of classification methods and are regularly utilized for benchmarking

purposes. Each of the studies examined also utilize multiple methods for benchmarking and

comparative analysis. Increasingly, more studies are using support vector machine and neural

networks as common methods along with various ensemble methods (random forests, bagging,

boosting). Nevertheless, specialized techniques like Random Subspace Method, Support Vector

Machineauc methods founded on survival analysis as well as evolutionary data mining algorithms are

also being developed. Notably, there is also more evidence of non-transactional churn research being

conducted with unique customer base and business value analysis models being adopted (Knox and

Oest, 2014; Ma et al., 2015).

The literature analysis has shown that regardless of the type of churn studied, churn is very

much a relational construct and should be viewed in this capacity. Therefore, in the very least, the

churn construct involves a discontinuation of a relationship between at least two parties. Hence, I

propose a more encompassing definition of churn that recognizes this construct as the severing of a

type of partnership between at least two entities in an organizational system (i.e. B2C, B2B, and

B2E) based on the specific type of churn transpiring. Additionally, it should also be noted that the

Page 34: THESIS/DISSERTATION APPROVED BY - Creighton University

32 | P a g e

various types of churn mentioned above have been studied in isolation. The above literature review

has revealed that churn is multifaceted and can impact an organization in multiple ways. Therefore,

it is argued that this construct should not be restricted to one form. Rather, in instances where

applicable, churn analysis should include its varied forms with the individual and combined impact

of each churn type on the financial performance of the firm. Furthermore, Table 1.1 also highlights

that there are many different drivers to churn. Thus, I have categorized the various churn drivers

into entity (customer, business and employee details) and non-entity (product/service attributes,

macroeconomic variables, customer firm interactions and competitive offerings) determinants.

Finally, although a multitude of predictive churn techniques have been applied, the practice of

employing multiple supervised learning methods appears to be standard to churn research.

1.5 Churn and Profitability

The relationship between churn and profitability was investigated by Reichheld and Sasser

(1990) who showed that reducing customer defection rates (churn) increased company profits with

varying effects by industry. A number of researchers have been able to develop customer profit

retention campaigns by modeling customer attrition behavior (Neslin et al., 2006; Verbeke et al.,

2012; Jahromi et al., 2014). Neslin et al. (2006) determined that firm profitability is a function of

customer churn probability, customer likelihood of accepting a retention offer, the business cost of

the retention effort and CLV to the organization. Verbeke et al. (2012) calculated the maximum

profit that could be realized from a retention campaign based on the customers who report the

highest attrition rates among a sample of telecommunications customers. Finally, Jahromi et al.,

(2014) established a B2B churn model for non-contractual customers by comparing the data mining

techniques of decision trees, ensemble learning and logistic regression on the basis of predictive

accuracy. By doing so, the researchers were able to develop a profit retention campaign targeted

towards their business customers.

Page 35: THESIS/DISSERTATION APPROVED BY - Creighton University

33 | P a g e

Numerous studies have been conducted to investigate the impact of employee turnover on

firm profitability either directly or indirectly (Hogan 1992; Ongori, 2007; Heavey et al., 2013; Hom

et al., 2017). Hogan (1992) estimated the employee turnover cost to be between 400-4000 dollars

per employee. Some researchers have found that there is a negative inverse relationship between

employee turnover, sales and profit margins (Heavey et al., 2013; Park and Shaw, 2013). However, I

have not come across any study that analyzed the combined impact of each of the churn types on

company profitability has not been analyzed. Consequently, a series of 4 propositions are developed

and suggestions on how each of the churn types can be tested in future research are described.

P1: The combined effects of customer and business churn entities are negatively related to firm profitability

P2: The combined effects of customer and employee churn entities are negatively related to firm profitability

P3: The combined effects of business and employee churn are negatively related to firm profitability

P4: The combined effects of customer, business and employee churn are negatively related to firm profitability

Figure 1: Multifaceted Impact of Churn on Firm Profitability

I believe that the combined impact of customer and business churn (B2C and B2B) is very

applicable to organizations that contain a significant portfolio of both consumer as well as business

client customers. For example, firms in the financial services industry, accounting,

telecommunications and legal services industry are all good candidates where the dual effects of each

Page 36: THESIS/DISSERTATION APPROVED BY - Creighton University

34 | P a g e

of these churn types can be studied as they rely on both types of consumer groups for financial

stability. The study of both customer (B2C) and employee churn (B2E) as well as business (B2B) are

well suited to organizations that suffer from high rates of employee attrition. High employee

attrition rates maybe due to a variety of factors that may include: job demands, stress, burnout,

working conditions, compensation, job satisfaction and host of other variables. Typically, sales

oriented roles in the financial services and insurance industry are well suited for these types of churn

analysis. Additionally, the accounting, hospitality, legal services, IT industry, trucking and nursing

industries are also good subjects. Finally, the combined effect of all three churn types can be tested

on traditional high churn industries like telecommunications, banking and financial services as well

as non-traditional churn industries like legal, accounting and IT to name a few.

1.6 Discussion and Contribution

Henceforth, it is my hope that future churn research adopt this multifaceted view of the

churn construct and apply it empirically. Hopefully, this will inspire further research in business and

employee churn which has been scarce. Moreover, research scholars and practioners are encouraged

to view churn as a relational construct that if not managed effectively results in the severing of

various types of partnerships (B2C, B2B, and B2E) according to the type of churn transpiring. The

churn framework, as depicted in Table 1.1 reveals that churn can be attributed to a combination of

both entity and non-entity characteristics across industries. Greater emphasis on incorporating non-

entity determinants to churn modeling is recommended. There is also a growing opportunity to

study churn in non-traditional industries like gambling, adventure, online retail as well as many

others and to combine archival analysis with experimental design as well as research non-

transactional situations. Interestingly, the literature review also reveals that neither of the churn types

have been studied from the viewpoint of an event study. This may be an interesting research

opportunity. Methodologically, the churn construct should continue to be studied by employing

Page 37: THESIS/DISSERTATION APPROVED BY - Creighton University

35 | P a g e

comparative supervised learning methods. It is my hope that the proposed framework will motivate

future churn research by providing an educational landscape to existing work in the field. Churn

scholars are encouraged to test the proposed propositions in future churn studies. Finally, churn

scholars are also encouraged to conduct a meta-analysis (Ascarza et al., 2018) of the churn literature

based on the secondary research already published.

Page 38: THESIS/DISSERTATION APPROVED BY - Creighton University

36 | P a g e

Chapter 2 2.1 Introduction:

This chapter seeks to understand churn in the context of the banking world. More

specifically, the chapter examines the determinants of used car customer auto loan churn. The

presented empirical study tests the explicit impact of a customer character model (i.e. a group of

entity variables) to auto loan churn modeling on subprime borrowers and compares this model to

the overall impact of a full model consisting of the 4 C's ( character, collateral, capacity and credit)

assuming a common set of control variables. Both the explanatory and predictive power of each of

the models is analyzed. Ledolter (2013) states that big data business studies commonly have less than

five to ten percent of influential variables given the total set of predictor variables in a dataset.

Therefore, this study seeks to examine the role of the customer character variables of residential

months, employment months, assets to age and net worth on auto loan churn. Various auto loan

churn prediction models are also developed and compared on the basis of different predictive

performance measures. Testing the explicit influence of customer character (individual borrower

characteristics) on auto loan churn has not been previously studied and is important when

considering to lend to a population of subprime borrowers who often are constrained on the basis

of credit, capital and collateral.

The study of auto loan churn is integral as over 107 million Americans in 2017 incurred auto

loan debt compared to roughly 80 million in 2012 (Federal Reserve Bank of New York, 2016). After

the great financial crisis, U.S. auto loans now exceed home mortgages (Long, 2017) and the demand

for used vehicles continues to be strong (“Cox Automotive 2018 Used Car Market Report &

Outlook”, 2018). Auto loans comprise a significant portion of asset backed securities (Scully, 2017)

that can be obtained either directly or indirectly. Direct lending entails obtaining the auto loan

directly from a financial institution while indirect lending involves loan contracts issued by the

Page 39: THESIS/DISSERTATION APPROVED BY - Creighton University

37 | P a g e

automotive dealer (Agarwal et al., 2008). It is essential to note that the risks to subprime auto

lending have minimally been investigated in the literature (Heitfield and Sabarwal, 2004; Agarwal et

al., 2008) and this study examines subprime auto loan risk from the viewpoint of customer churn

which transpires after a direct loan agreement has been established. The next section begins by

briefly describing extant banking churn literature and then transitions to the topic of subprime auto

loan churn.

2.2 Churn and Banking

Banking churn has been examined by multiple researchers in varying contexts (Chiang et al.,

2003; Khan et al., 2010; Ali & Arıtürk, 2014) as it is a perpetual problem to financial institutions.

Banking churn is best defined as a banking customer severing a B2C or B2B partnership. Specific to

evaluating customer churn in the banking industry, Zoric (2016) contended that customers who used

multiple banking products were less likely to churn. Zoric constructed a neural network model that

accounted for the role of gender, age, average monthly income, consumer status (pensioner, student,

employed, unemployed), whether the customer banked online or not and whether they used two or

more bank products. Empirically, Zoric was able to show that customers who used less than three

products were most likely to churn. Zoric’s findings suggest that bankers should offer customer

incentives that would entice the purchase of additional banking products to high risk churners.

Keramati et al. (2016) studied customer banking churn in the context of electronic banking.

The authors studied the influence of customer dissatisfaction (length of customer association,

number of customer complaints), service usage (the total number and transactional amounts of

usage) and a host of customer related variables (age, gender, type of career, educational level) on

churn for various electronic banking portals (internet bank, telephone bank, mobile bank, ATM).

Using decision tree analysis, the researchers were able to isolate a specific profile for churners.

Page 40: THESIS/DISSERTATION APPROVED BY - Creighton University

38 | P a g e

Credit card churn has also been studied by various research scholars (Anil Kumar and Ravi,

2008; Farquad et al., 2014; Rajamohamed and Manokaran, 2017). Nie et al. (2011) predicted

customer credit card churn for a Chinese bank by using both logistic regression and decision tree

techniques. Specifically, the researchers obtained a data warehouse of sixty million customers

containing personal details for each credit card holder, basic credit card information points,

transactional information, abnormal usage data and risk details. In sum, this consisted of 135

variables. After testing for multicollinearity, the researchers found that logistic regression was more

superior in predicting credit card churn than decision trees. Given that customer banking churn has

been investigated in a number of different contexts (i.e. retail banking, credit card churn and

electronic banking churn) this study extends banking churn research by specifically studying the role

of subprime auto loan churn.

2.3 Subprime Auto Lending

Subprime auto lending has been on the rise since 2008 (Seide, 2012) but is very costly and

risky to financial institutions (Heitfield and Sabarwal, 2004; Agarwal et al., 2008). Subprime lending

differs from prime lending in that it is characterized as financial lending to higher risk borrowers as

evinced by their lower credit scores and poorer credit history, higher loan to value ratios (LTVs),

lower safety net reserves and asset holdings (Courchane et al., 2004). A subprime borrower generally

possesses a credit score in the range of 501-600 (Huffman, 2018) and borrowers whose credit scores

are below this range are categorized as deep subprime borrowers. This form of lending enables

banks to charge higher interest rates, processing fees and enact prepayment penalties to subprime

borrowers (Courchane et al., 2004). However, it also poses risks to financial institutions. These risks

are further described below.

Page 41: THESIS/DISSERTATION APPROVED BY - Creighton University

39 | P a g e

2.4 Financial Risks to Subprime Auto Lending

The various drivers to auto loan prepayment and default for both prime and subprime

borrowers have been minimally investigated. Notable exceptions include Heitfield and Sabarwal

(2004) and Agarwal et al. (2008). Heitfield and Sabarwal (2004) examined the case of subprime

borrowers on a sample of 3,595 monthly pool loan observations obtained from various automobile

finance companies designated by Moody’s. The authors utilized hazard function analysis to observe

the impact of loan age (seasoning), macroeconomic variables (unemployment rate, household debt-

service burden, personal bankruptcy cases, and one year treasury rates) and loan issuers (differences

among subprime lenders) on loan default and prepayments. The researchers found that prepayment

rates increase with loan age and that the current market interest rate did not impact prepayment

rates. Moreover, as a borrower's credit history improved there was a tendency to switch to prime

rate loans and prepay by selling one’s vehicle. Loan defaults were largely triggered by disruptions to

household income. The authors notably observed customer heterogeneity among subprime

borrowers and recommended financial institutions develop target market strategies to address

differing subprime borrower segments.

Along similar lines, Agarwal et al., (2008) conducted a regression analysis on a sample of

24,384 direct auto loans issued over a five year period by a large financial institution. The authors

were able to capture the effects of customer borrower risk factors (FICO score, monthly income,

monthly disposable income, borrower age) economic conditions (county unemployment rate, the

three year treasury note rate), consumption characteristics (type and model of car purchased, year of

vehicle being financed, loan amount, loan to value ratio (LTV), contract rate) and a host of control

variables (state of residence dummies, owner age, age of loan, quarter dummies, state specific

effects) in predicting both default and prepayment outcomes. Essentially, the researchers found that

auto loan prepayments are likely to increase on new cars, lower credit risk customers are more likely

Page 42: THESIS/DISSERTATION APPROVED BY - Creighton University

40 | P a g e

to prepay, a rise in the LTV ratio decreases the odds of prepayment, income increases result in a rise

of prepayments and the odds of prepayment increase when there is a decline in the treasury note

rate. Furthermore, auto loan defaults were more likely with an increase in the LTV ratio. Used car

owners were more likely to default with a rise in the unemployment rate. Lastly, the odds for default

increased for lower FICO score customers and with a decline in the three year treasury note rate.

Bhardwaj and Bhattacharjee (2010) specifically studied personality traits to predict auto loan

default using a survey methodology. The authors examined two of the four dimensions of the

Money Attitude Scale (MAS) (anxiety and power prestige) on a sample of 501 Indian banking

consumers and determined consumer likelihood to both seek auto loan services and default on loan

payments using discriminant analysis. The authors found that consumers who score high on power

prestige defaulted less and those that exhibited high anxiety are likely to default more.

Thus, the extant literature reveals two major financial risks with auto loan lending. The first

is the risk of customer loan default. Customer default is an issue because it impacts the financial

bottom line of lending institutions and restricts their lending potential. Furthermore, loan defaulting

results in lower credit scores which impact credit accessibility (Sengupta and Bhardwaj, 2015). The

second major risk is prepayment of the loan. This entails the borrower paying the loan early and

poses the problem of lost interest income to the auto loan lender (Agarwal et al., 2008). The

switching behavior results in the loss of business income, impacts CLV (Neslin et al., 2006; Fader

and Hardie, 2010; Braun and Schweidel, 2011) and dampens relationship marketing efforts

(Risselada et al., 2010). From the perspective of the borrower, the action of prepaying the loan can

entail penalty fees to the borrower which may have to be incurred at the expense of qualifying for a

prime rate loan. Nevertheless, prepayment is in the best interest of the consumer as it improves their

financial status and ensures that they are not ethically violated in the lending process. This

Page 43: THESIS/DISSERTATION APPROVED BY - Creighton University

41 | P a g e

safeguards society and fosters deeper trust and confidence in lending institutions. Hence, churn is

advantageous to the consumer but disadvantageous to the lending institution. The action of

prepaying the auto loan or switching to a prime rate loan becomes the basis for defining auto loan

churn (Agarwal et al., 2008) and is described further in the methodology.

2.5 The 4 C’s to lending

The 4 C’s of lending refer to: Character, Collateral, Capacity and Credit. These four factors

are important because they influence the consumer and business lending process by epitomizing a

combination of the head, heart and gut (Jankowicz and Hisrich, 1987). The 4 C’s entail a

combination of both quantitative as well as qualitative assessment (Jankowicz et al., 1987). Firstly,

character refers to individual borrower characteristics that are used to evaluate credit worthiness.

Quantifying character is not always easy and firms must be conscientious about finding appropriate

proxy measures for this particular element. Secondly, collateral comprises of the down payment or

pledge borrowers make to obtain a loan (Jimenez and Saurina, 2004). Thirdly, capacity pertains to the

borrower’s ability to repay and finally credit refers to the borrowers credit history (score) which

serves as a track record of their ability to meet financial obligations (Kiisel, 2013). Very few scholarly

studies have investigated the impact of the 4 C’s to consumer and business lending practice (Haron

and Shanmugam, 1994; Jimenez and Saurina, 2004). Key insights from some of these studies are

briefly described below.

Haron et al., (1994) examined differences in Malaysian loan officer evaluations of customer

character, capacity, capital, collateral and condition among a sample of business owners seeking a

business loan. The authors’ inclusion of condition factored the environment in which a business

operates. Haron et al. (1994) found that assessing customer character was most influential in making

loan decisions but remains a common challenge for loan officers. In this study, the authors defined

customer character as “attitude, level of knowledge in all aspects of business, and the way of

Page 44: THESIS/DISSERTATION APPROVED BY - Creighton University

42 | P a g e

thinking of the borrower” (Haron and Shanmugam, 1994, p.90). Correspondingly, Jimenez and

Saurina (2004) explicitly examined the influence of collateralized loans on defaulting among Spanish

financial institutions. They found that collateralized roles accounted for a higher level of bank loan

defaults. Additionally, Moti et al. (2012) also mentioned the individual factors of customer character,

credit, capacity, collateral and conditions as critical to evaluating microfinance requests.

Nevertheless, I am not aware of studies that have explicitly investigated the 4 C’s in the

churn literature. It is argued that effective customer character assessment is all the more critical

when providing financial services to a population of subprime borrowers since additional factors

besides borrower credit scores, repayment capacity and collateral need to be factored. Moreover,

established industry proxies for customer character are believed to serve as measures of customer

reputation (“Innovative Funding Services”, 2019) and serve as signals of trustworthiness,

commitment and dedication to the repayment process. Most often, customer character is critically

assessed when the other 3Cs are difficult to evaluate and do not provide adequate information to

make a lending decision. Also, it is believed that effective screening for customer character has the

ability to strengthen firm customer relationships thereby decreasing the odds of severing the

partnership relationship. Subsequently, the following hypothesis is developed and serves as the basis

for empirical analysis.

H1: There is a difference between the full model (4Cs) and the restricted model (customer character) in churn

prediction

2.6 Dataset

The dataset for this study has been obtained from a Midwestern credit union that has both

branch and loan office locations in three major Midwestern states. This credit union is unique in that

it has a large number of subprime customers. Customer data was supplied after taking precautionary

Page 45: THESIS/DISSERTATION APPROVED BY - Creighton University

43 | P a g e

measures to omit direct client identification details in order to preserve privacy. The dataset includes

all direct used car auto loans issued by the credit union over a three year time interval (2014-2016) by

merging customer loan application details, credit bureau information, credit union membership

details, customer demographic as well as actual loan details. In total, this consists of a combination

of 22,261 churn and non-churn observations. Each of these observations represent transactional

data of used car auto loan contracts. The data set has randomly been split into training (i.e.

approximately 80% of 5,543 churn observations) and test sets (i.e. roughly 20% of 5,543 churn

observations) as standard practice to churn model generation (Keramati et al., 2016; Koturwar et al.,

2015; James et al., 2013) in order to avoid overfitting. This splitting process is further explained

below in the results section. For the purposes of building churn models, the original dataset of

22,261 observations will be balanced (an equal number of churners and non-churners) as the class

imbalance problem can bias the classification prediction model (Chawla et al., 2009). However, it

should be noted that prior to balancing, the dataset was heavily skewed towards a high incidence of

churners (i.e. the major class that consisted of roughly 75% of the dataset).

2.7 Modeling Techniques

Before churn prediction modelling techniques can be applied, a detailed account of the basic

vocabulary that constitute these models must be understood. Such vocabulary includes: target,

variable, quantitative versus qualitative variables and categorical variables. The notion of class is also

central to churn modeling and is also described below.

Target Variable:

A target variable is also commonly referred to as an outcome variable, response variable or

dependent variable. The prediction or estimation of a target variable involves the usage of various

predictor variables (i.e. commonly denoted as xi where i= 1, 2, 3 ….). Target variables are commonly

marked as (Yi) where Yi= β0 + β1xi + e in simple regression form (Takezawa, 2014).

Page 46: THESIS/DISSERTATION APPROVED BY - Creighton University

44 | P a g e

Quantitative and Qualitative (Categorical) Response Variables:

Variables are either categorized as being quantitative or qualitative in nature. A quantitative

response variable takes on a numeric value and can include responses such as a person’s height,

income or age. Statisticians typically refer to quantitative responses as candidates for regression

based problems. Alternatively, a non-numeric variable that possesses various categories or classes is

known as a qualitative variable. Common examples of qualitative variables include one’s gender,

default/no default on paying a loan and detection/non-detection of a particular disease. Qualitative

response variables are referred to as classification problems and serve as the basis of this dissertation

(James et al., 2013).

The ability to predict dependent variables which are categorical or qualitative (i.e. have

classes) requires utilizing classification techniques (James et al., 2013). There are a number of

different classification techniques (i.e. classifiers) that can be employed to predict a categorical

variable. Generally speaking, classification techniques are grouped into either supervised or

unsupervised statistical learning categories. Supervised learning (i.e. directed) revolves around

building a predictive model for an outcome variable or estimating its value based on various

predictor (input) variables. The outcome variable’s class is known in advance. In contrast,

unsupervised learning (undirected) is descriptive and entails only knowing the input variables. There

is no output variable or class with unsupervised learning. Since this study seeks to predict auto loan

churn, the target variable is qualitative and known (i.e. churn/no churn), supervised learning

classification techniques are employed (Koturwar et al. 2015, James et al., 2013). More specifically,

churn models are an example of binary classification (Coussement et al., 2008).

The widespread traditional supervised learning methods of logistic regression, linear

discriminant analysis, decision trees and random forests are applied as part of this analysis (James et

al., 2013; De Caigny, 2018). To the best of my knowledge, none of these techniques have been

Page 47: THESIS/DISSERTATION APPROVED BY - Creighton University

45 | P a g e

1 1

1 1

1 1

collectively applied to model auto loan churn to date. Subsequently, the results of each classification

technique will also be compared on the basis of accuracy and error rates as well as other model

performance measures which are commonly obtained from confusion matrices. A detailed

description of each of these methods is captured below.

Logistic Regression:

Logistic regression is a form of regression that is appropriately used when the dependent

variable in a study is dichotomous or categorical (binary) and when there is interest in knowing the

probability that a dependent variable is contained in a particular category (Nie et al., 2011; James et

al., 2013; Coussement et al., 2017). Since the target variable is discrete (Delen, 2010) the probability

of the dependent variable being contained in a particular class will always be greater than 0 and less

than 1. Logistic regression models are founded on the logistic function. This function produces an S

shaped curve. The equation of the logistic function is depicted below (James et al., 2013).

Logistic Function:

p(x) = eβ

0+β X

1+ eβ0+β X

p(x) = 1

1+e-(β0+β X )

The method of maximum likelihood estimation (MLE) is used to determine the parameters

of the logistic model. In order to employ this parametric technique, firstly linearity has to be shown

between the predictor variables and the logit of the dependent variable. This is determined by

examining the interactive effects of each of the predictor variables with its log transformation (Field,

2012). Additionally, there should not be multicollinearity between predictor variables. Finally, much

like ordinary regression, independence of errors must be observed in the dataset (Field, 2012).

Page 48: THESIS/DISSERTATION APPROVED BY - Creighton University

46 | P a g e

Logistic regression is an attractive classification technique because it provides the advantages

of predictive power and comprehensibility (Verbeke et al., 2011; Risselada et al., 2010; Martens et al.,

2011) even against more sophisticated data mining techniques (Coussement et al., 2017).

Additionally, it is able to model linear relationships effectively (De Caigny et al., 2018) and is relevant

as it provides robust results that have been used as common benchmark measures by which other

predictive churn techniques can be compared (Buckinx and Van den Poel, 2005; Coussement et al.,

2017). Moreover, this classification technique is appealing due to easy interpretability (Owczarczuk,

2010). However, the primary limitations to this classification technique include rigid normality

assumptions (Delen, 2010), challenges when modeling interactive effects (De Caigny et al., 2018)

and non-linear relationships between the predictor and outcome variable (Zhang et al., 2007; De

Caigny et al., 2018). Yet, despite its limitations, logistic regression has been a widely used technique

to model churn in a variety of industries (Delen, 2010; Nie et al., 2011, De Caigny et al.; 2018).

Consequently, I seek to apply this classification technique for the purposes of this study.

The performance of Logistic Regression models can be assessed by examining a number of

different measures. The log-likelihood (LL) statistic is akin to the residual sum of squares in multiple

regression (Field, 2012) allotting for unexplained information when comparing fitted models. The

mathematical expression shown below follows the representation as illustrated by Field (2012).

N

Log-likelihood = ∑ [Yi ln(P(Yi)) + (1-Yi)ln(1-P(Yi))] i=1

When comparing models, larger log-likelihood values denote more unexplained

observations. An extension of the log-likelihood measure is deviance which is (-2*log-likelihood or -

2LL). The comparison of deviance values between models enables the calculation of the likelihood

ratio containing a chi-squared distribution with degrees of freedom calculated according to the

Page 49: THESIS/DISSERTATION APPROVED BY - Creighton University

47 | P a g e

difference in the number of predictor variables between each of the models in the study.

Mathematically, following the nomenclature as laid out by Field (2012), this is shown as follows:

χ2 =2LL (new model) -2LL (base model)

df =knewmodel-kbasemodel

The Akaike information criterion (AIC) is commonly used to evaluate goodness of model fit

when conducting logistic regression analysis. AIC measures enable model comparisons by penalizing

models that contain more explanatory variables (Field, 2012). Smaller AIC values denote better fit.

The formula to calculate AIC is:

AIC = -2LL +2k where k = number of predictors.

Lastly, when conducting regression analysis, new models and base models are compared on

the basis of an F test obtained by conducting an Analysis of Variance (ANOVA). The F test also

referred to as the model chi-square statistic in regression analysis allows for easy model comparison

(Field, 2012)

Linear Discriminant Analysis:

Linear Discriminant analysis (LDA) is a widely applied machine learning classification

technique that also has been utilized in churn analysis (Sharma and Panigrahi, 2011; Khan et al.,

2015). This classification technique is also popular as it resembles multiple linear regression and can

generate multiple equations for each state of the outcome variable (Gnanadesikan, 1988).

Additionally, it is a recommended classification technique when unbalanced samples are used and

when the target variables are reported as non-numerical (Khan et al., 2015). Moreover, it is also

popular due to its ability to provide a confusion matrix for analysis which can be compared to the

confusion matrix obtained from logistic regression. According to James et al. (2013), LDA is

Page 50: THESIS/DISSERTATION APPROVED BY - Creighton University

48 | P a g e

applicable when two or more response categories exist and each of the classifiers are well separated.

LDA assumes that the class observations drawn are normal and tends to perform superior to logistic

regression when these assumptions are met (James et al., 2013)

Decision Trees (DT):

DT classification methods involve building tree models that consist of a series of predictors.

Each of these predictors (attributes) within a training set are split repetitively until pure subsets are

obtained. This process of repetitive splitting is influenced by a particular entity’s (i.e. customer)

characteristics (Denison et al., 1998; James et al., 2013). The basic anatomy of a DT comprises of

both a leaf node and a decision node. The leaf node represents a predictor variable and signifies the

point where binary splits transpire. Leaf nodes are also known as internal nodes (Mendez et al.,

2008). The decision node also known as the terminal node (Mendez et al., 2008) represents the

output variable (binary outcome variable) and graphically is depicted as the end of the branch. It is

the terminal node that serves as the basis for churn prediction for it reports the category with the

majority of cases. Extant literature has revealed that there are four major DT machine learning

algorithms that are commonly utilized: 1) Classification and Regression Trees (CART) 2) C4.5 3)

chi-squared automatic interaction detection (CHAID) and 4) C5.0. DTs serve as the foundation to

other tree methods like random forests and ensemble forests which essentially involve aggregating

multiple decision trees (James et al., 2013). Figure 2.1 below provides a schematic of the basic

anatomy of a DT.

Page 51: THESIS/DISSERTATION APPROVED BY - Creighton University

49 | P a g e

Employer<6 months

Figure 2.1. Anatomy of a Decision Tree

Y N

Y N

The process of binary splitting an attribute relies on selecting the right attributes to split.

Correct attribute selection is dependent on calculating either entropy measures (C4.5) or choosing

the Gini criterion (CART) based on the type of DT algorithm (Verbeke et al., 2012). DT analysis is

quite popular due to simplicity, graphical layout and ease of interpretation (James et al., 2013;

Höppner et al., 2017). DTs provide an appropriate schematic to model both quantitative and

qualitative decision making questions without needing to create dummy variables or transformations

(James et al., 2013; Höppner et al., 2017). Moreover, DTs are also able to monitor non-linearities

and are easy to compute (Höppner et al.et al., 2017). However, DTs also have their disadvantages.

DT results may not always be as predictively accurate as other methods. Furthermore, minor

changes in the dataset can result in non-robust predictions (James et al., 2013). However, this

classification technique has been used frequently to model churn (De Caigny, 2018; Verbeke et al.,

2011; Owczarczuk, 2010). For example, Coussement and De Bock (2013) reference Murthy (1998)

who observed over 300 references to decision tree analysis in a variety of research studies.

No Churn

Current Residence < 3 months

Yes Churn

No Churn

Page 52: THESIS/DISSERTATION APPROVED BY - Creighton University

50 | P a g e

Random Forests:

Random forests (RFs) are an extension of decision trees. The forest component to this

supervised black box learning method comprises of many or multiple decision trees (Breiman, 2001;

James et al., 2013; Mendez et al., 2008). According to Mendez et al. (2008) as many as 500 unpruned

trees can exist in a RF. The random component to RFs are constructed by utilizing bootstrapped

training samples with replacement (i.e. bagging). This means that a random percentage of

respondents (e.g. 20%) and a random percentage of predictor variables (e.g. also 20%) are trained to

build each decision tree in the forest. By only sampling a certain number of predictors at a time, the

correlation between decision trees is reduced (Mendez et al., 2008). The classification outcome from

each tree is determined to arrive at an overall prediction through a process of majority voting.

Collectively, the category that has obtained the most votes across all the trees sampled determines

the predictive outcome (Mendez et al., 2008)

James et al. (2013) discuss the advantages of random forests. Random forests are known to

improve predictive accuracy as more than one tree is being assessed. Since multiple trees are being

assessed, the degree of variance in the model is also reduced thereby reducing any overfitting of the

data. This classification technique also assesses for variable importance by predictive ability (Mendez

et al., 2008) and does not require adherence to linearity assumptions between any independent and

dependent variables and is forgiving of outlier values (James et al., 2013). Variable importance scores

enable data scientists to decipher which predictor variables are most important using the Gini index.

However, like other classification techniques there are also disadvantages. The interpretation of

multiple DTs poses interpretability challenges as it can be difficult to ascertain the direct impact of

predictor variables on the dependent variable (Mendez et al., 2008). Furthermore, this supervised

method does not work well with small samples.

Page 53: THESIS/DISSERTATION APPROVED BY - Creighton University

51 | P a g e

p* represents the probability of auto loan churn for the Full Model and q* represents the probability for auto loan churn for the Restricted Model

logit (p) = ln(odds of p)

Table 2.1 Advantages and Disadvantages of Each of the Classification Methods Classification Method Logistic Regression

Advantages Predictive power comprehensibility/interpretability

Disadvantages Rigid normality assumptions

Linear relationships challenges when modeling interactive effects and non-linear relationships

Benchmarking attractiveness for which other classification methods are compared

problematic with correlated predictors challenge when the number of predictors is greater than the sample

Linear Discriminant Analysis Intuitive like MLR Conducive to unbalanced samples Works well with small samples

Normality assumptions exist Multicollinearity issues

Decision Trees Simplicity and ease of interpretation with graphical basis no linearity assumptions no need for dummy variables or transformations

Not always predictively accurate as other methods minor changes in the dataset can result in non-robust predictions overfitting

Random Forests Multiple trees are being assessed Reduces overfitting No linearity assumptions Able to obtain variable importance measures Allows for outliers and noise in training data

Hard to interpret does not work well with small samples

2.8 Estimation Specifications:

Consequently, the following equations are derived for logistic regression and LDA:

logit(p) = b0 +b1 X1 + b2 X2 + b3X3+ b4X4 + b5X5

ln(odds of p) = b0 +b1 X1 + b2 X2 + b3X3+ b4X4 + b5X5

odds of p = e(b 0

+b1

X1

+ b 2

X 2

+ b3X

3+ b

4X

4 + b

5X

5)

Page 54: THESIS/DISSERTATION APPROVED BY - Creighton University

52 | P a g e

1+ e )

1 1 5

1+e 0

probability of p = odds of p

1+ odds of p

p* = 1 - ( b0 +b X + b X + b X + b X + b X

1 1 2 2 3 3 4 4 5 5

logit(q) = b0 +b1 X1 + b5X5

ln(odds of q) = b0 +b1 X1 + b5X5

odds of q = e(b 0

+b X + b5 X )

probability of q = odds of q

1+ odds of q

q*= 1 - ( b +b1 X + b X5 )

1 5

This translates to:

Logistic Full Model = Intercept + Character + Credit + Capacity + Collateral + Controls (1)

Logistic Restricted Model = Intercept + Character + Controls (2)

Similarly, the LDA equations are mathematically expressed as:

No Churn outcome Full Model = b0+ ∑bkXk (k = 1, 2, 3, 4, 5)

Yes Churn outcome Full Model = b0 + ∑bkXk (k = 1, 2, 3, 4, 5)

No Churn Outcome Restricted Model = b0 +∑bkXk (k = 1, 5)

Yes Churn Outcome Restricted Model = b0 +∑bkXk (k = 1, 5)

This translates to the following:

No Churn Full Model = Intercept + Character + Credit + Capacity + Collateral + Controls (3)

X1 =Character

X2= Credit

X3= Capacity

X4= Collateral

X5= Controls

Page 55: THESIS/DISSERTATION APPROVED BY - Creighton University

53 | P a g e

Yes Churn Full Model = Intercept + Character + Credit + Capacity + Collateral + Controls (4)

No Churn Restricted Model = Intercept + Character + Controls (5)

Yes Churn Restricted Model = Intercept + Character + Controls (6)

2.9 Comparing Model Performance

To evaluate the effectiveness of each of these supervised learning methods (i.e. classifiers),

confusion matrices will be used. A confusion matrix is a 2X2 table that enables the comparison

between a predicted binary classifier and the actual class observed in both the training and testing

datasets. The comparison between the actual and predicted class values enables the calculation of

both error (i.e. misclassification) and accuracy rates (James et al., 2013; Chen, 2015). In addition to

error and accuracy rates, three other metrics which include precision, recall and F1 (F score) can also

be calculated with confusion matrices. The addition of these measures is strongly suggested as the

reliance on solely accuracy and error rates can be problematic if there is a class imbalance issue with

the dataset. Despite balancing the dataset (i.e. see further down), the confusion matrix analysis

includes all of these predictive measures. However, for the purposes of predictive performance

evaluation, the final analysis will concentrate on accuracy measures (i.e. also error rates) as there is

no class imbalance problem (Chawla et al., 2009)

Table 2.2 displays the general layout of a confusion matrix where TP stands for True

Positive, FN stands for False Negative, FP represents False Positive and TN is the

abbreviation for True Negative (Ohsaki et al., 2017). In the context of this paper, TP implies

correctly classifying an actual churner as a churner. TN means correctly classifying a non-

churner as a non-churner. A FN classification means classifying a churner as a non-churner

and is regarded as a type II error and a FP classification is classifying an actual non-churner

as a churner (i.e. type 1 error) Thus, misclassifications are due to FN and FP classifications.

FN misclassifications are viewed as more costly than FP classifications in this study as the

Page 56: THESIS/DISSERTATION APPROVED BY - Creighton University

54 | P a g e

loss of a customer leads to lost interest earnings for the bank. Similarly, FP classifications

lead to wasted marketing efforts (Van Haver, 2013) on a secure customer.

Chen et al. (2015) provide the formulas for accuracy, precision, recall and F1 below. Collectively, these measurements will be compared among the various predicted models.

Table 2.2 Elements of the Confusion Matrix

Actual No Churn Churn Prediction No Churn TN FN

Churn FP TP

Accuracy = TP + TN (1)

TP + FP + FN + TN

Or

1-Error Rate

Error = FP+FN (2)

TP + FP + FN + TN

Or

1-Accuracy Rate

Precision = TP (3)

TP + FP (total positives)

Recall = TP = TP (4)

TP + FN P (actual positives)

F1 = 2*Precision*Recall (5)

Precision + Recall

Page 57: THESIS/DISSERTATION APPROVED BY - Creighton University

55 | P a g e

Accuracy rates denote the number of correct predictions as a percentage of the total number

of predictions. Accuracy rates lie in the range of 0 to 1, with higher values denoting higher accuracy.

In contrast, lower error rates (i.e. also known as misclassification rates) signify better model

performance (i.e. more accuracy). However, accuracy measures assume equal misclassification costs

between FN and FP categorizations (Van Haver, 2013).

Precision, also known as the positive predictive value (PPV) is the ratio of the number of

correct positive predictions to the total number of positive predictions. Precision values range from

0 to 1 with the best precision value being 1.0 and the worst 0. Precision identifies how often a true

positive prediction is correct (how often a TP prediction is correctly predicted). However, precision

does not assume both types of misclassification. An increase in the precision measure implies a

decrease in the recall measure.

Recall or sensitivity, is the number of correct positives (TP) as a percentage of the total

number of actual positives. This measure is also known as the TP rate. The best recall value is 1.0

and the worst is 0. Recall denotes how often the classifier predicts a TP occurrence by obtaining the

TP rate. Recall measures do not assume both types of misclassification (Van Haver, 2013). An

increase in recall implies a decrease in precision.

Finally, the F1 measure is the harmonic mean (weighted average) of the precision and recall

measures. F1 scores also are in the range of 0 to 1 with higher scores indicating better performance.

It is important to note that F1 scores attach equal importance to both precision and recall measures

and do not factor true negatives into the analysis. The F1 scores are attractive as they factor both

measures of precision and recall into the analysis and are not reliant on a single measure. Besides F1

Page 58: THESIS/DISSERTATION APPROVED BY - Creighton University

56 | P a g e

scores there are also F2 (recall is weighted greater than precision) and F0.5 (precision is weighted

greater than recall) scores as well (Sokolova et al., 2006).

Additionally, confusion matrices also allow for the creation of a Receiver Operating

Characteristics (ROC) curve which is a popular tool to compare models (James et al., 2013;

Coussement et al., 2017) due to its intuitive nature and ability to provide clear statistical

interpretations (Coussement et al., 2017). This curve visually illustrates the overall performance of

each predicted classifier by examining the area under the curve (AUC) at several threshold values

(i.e. for different probability cut off values of the dependent variable). Essentially, the larger the

AUC value the more accurate the model (i.e. a higher AUC value means the predicted classifier is

correctly classifying a churner from a non-churner in the dataset). An accurate classifier should yield

an AUC value larger than 0.5 (Verbeke et al., 2012).

The Area under the ROC curve contains two dimensions (i.e. shown as the x and y axes of

the plot). The y axis displays the True Positive (TP) rate and entails the percentage of churners that

are correctly classified as churners. The TP rate is referred to as sensitivity. The x axis is typically

depicted as (1-specificity), where specificity amounts to the TN rate. Thus, specificity denotes the

percentage of non-churners that are correctly classified as non- churners at various threshold values.

The x axis is also often referred to as the False Positive rate. It is important to note that both the

measures of sensitivity and specificity share an inverse relationship with one another. Hence, an

ROC reveals both accuracy, sensitivity (TPR) and 1-specificity among each of the predicted

classifiers and serves as one additional way to compare each of the predictive models in this study

(James et al., 2013).

A series of studies have compared the predictive performance of different classification methods

(Abu-Nimeh et al., 2007). For example, Abu Nimeh et al. (2007) investigated the predictive performance of seven

classifers in predicting phishing emails on the basis of various confusion matrix measures. The authors determined

Page 59: THESIS/DISSERTATION APPROVED BY - Creighton University

57 | P a g e

that random forest reported the highest accuracy rate (lowest error rate). In terms of evaluating predictive

performance according to AUC measures, the study found neural networks, random forest, support vector

machine, logistic regression, bayesian additive regression tree and decision trees in order of highest to lowest AUC

measures.

Likewise, in the world of bioinformatics, Wu et al. (2003), compared the different classification methods of

linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor, support vector machine, bagging

and boosting decision trees and random forests on the ability to predict ovarian cancer in patients from mass

spectrometry (MS) data. Again, the authors found that random forest outperformed the other classification

methods on the basis of the prediction error rate (misclassification rate).

Finally, Barboza et al. (2017) compared the traditional predictive models of logistic regression, linear

discriminant analysis and the early artificial intelligence (AI) model of artificial neural networks to the machine

language algorithms of support vector machines, bagging, boosting, and random forest to predict corporate

bankruptcy among ten thousand North American firms. The authors found that machine learning methods

reported ten percent more accuracy than traditional statistical and AI methods. In fact, random forest reported the

highest accuracy at 87% and an AUC of 0.92 making it the most superior among the classification methods.

Consequently, given the predictive performance indicators described above through confusion matrix

analysis, ROC curve comparison descriptions, AUC characteristics and the outline of the advantages and

disadvantages of each of the classification methods summarized in Table 2.1 the second hypothesis reads as

follows:

H2: The black box supervised learning method of RF is most superior in predictive performance compared to the

other classification methods

Page 60: THESIS/DISSERTATION APPROVED BY - Creighton University

58 | P a g e

2.10 Variables of Study

A number of variables have been extracted from the credit union’s data warehouse which largely consist of

raw, unaltered data. Only two variables, age and FICO score needed to be transformed due to raw data issues.

Firstly, it was observed that there were a few cases of missing values for the AGE variable (did not record age of

customer at time of loan application) and a few instances where the values supplied for a particular customer were

questionable (i.e. age of 125 years). Consequently, any missing or erroneous values for this variable were neglected

and designated with a N/A label. Secondly, the raw data displayed a few records with a zero FICO score for select

loan transactions. While it is certainly possible for a customer to have a zero FICO score (especially if they are new

to the country or don’t have prior credit history), a FICO score of 525 served as the minimum score after

calculating the charge off ratio for each loan contract. In other words, the observations that reported a zero FICO

score approximately contained the same charge off ratio as those observations where the FICO score was 525.

Consequently, a zero FICO score was altered to 525 in order to serve as a predictor variable for the purposes of

analysis. This variable is denoted as FICON in the dataset. After making these adjustments a total of

22,261observations were available.

The ability to determine accurate proxy variables to measure each of the 4 C’s are helpful to model real

world phenomena (Stahlecker and Trenkler, 1993). As mentioned previously, character refers to individual borrower

characteristics that are used to evaluate credit worthiness. This often includes personal and psychological details

(Moti et al., 2012). Examples of personal details include income, occupation, age, personality, self-concept and life

cycle stage. Psychological details are non-tangible and entail understanding the consumer’s inner worth (Moti et al.,

2012). Therefore, it is argued that customer character assessment encompasses both understanding of personal as

well as psychological details.

Table 1.1 identified the various determinants to churn based on the literature review of articles after 2011.

Table 1.1 clearly demonstrated that churn behavior is due to a combination of both entity and non-entity

determinants. Applying this categorization scheme to the present problem, Table 2.3 illustrates how each of the

variables have been classified. Furthermore, adhering to industry loan underwriting principles each of the 4C’s are

Page 61: THESIS/DISSERTATION APPROVED BY - Creighton University

59 | P a g e

proxied as follows below (“Credit Guru Inc., 2019). The various control variables outlined below have been

obtained from previous studies (Heitfield and Sabarwal, 2004; Agarwal et al., 2008) and encompass a combination

of both entity and non-entity determinants.

It is important to note that churn in the context of this study refers to prepayment of the auto loan or the

switching behavior of the borrower (customer) to another prime lender. While, prepayment and switching

behavior are certainly advantageous to the borrower, it results in lost interest income to the financial institution

issuing the auto loan. This lost income can be very significant when there is a large volume of churners (i.e.

roughly 75% in this study). Therefore, churn is specific to prepayment and switching behavior and should not be

confused with loan default behavior.

Page 62: THESIS/DISSERTATION APPROVED BY - Creighton University

60 | P a g e

Table 2.3 Summary of Key Variables

Entity and Non-Entity

Determinants E= Entity

NE= Non-Entity

DV/Target Variable:

Churn (yes/no). Refers to either prepayment of the loan or customer switching to a prime lender. In other words, the loan was closed before the term of the loan came to an end. Binary

(yes/no)Not Applicable

Character (IV of interest):

Residential Months (Time in Current Residence in months)

Employment Months (Time with Current Employer in months)

Assets to Age Ratio (ratio) in dollars

Net Worth (dollars)

Collateral:

Loan to Value Ratio (ratio),

Term of loan (months)

Capacity:

Monthly Income (dollars),

Existence of Co-borrower (factor),

Debt to Income (ratio),

Disposable Income (dollars)

Credit:

FICO (score)

Controls:

Age (years),

Exposure (value). Measure of risk. Exposure of the asset is greatest when there is a loss to the principal value of the asset (asset decreases)

Unemployment rate in Zip Code of customer residence (percentage rate),

Marital Status (factor. Married, not married)

# of Dependents (number)

Includes integer and

numeric values E

Numeric value E

Includes integer and

numeric values E and NE

Variables Type

Includes integer and

numeric values E

Includes numeric and integer values

E

Page 63: THESIS/DISSERTATION APPROVED BY - Creighton University

61 | P a g e

2.11 Results

Descriptive Statistics

Table 2.4 illustrates the number of churners and non-churners in the original sample prior to

balancing the dataset and omitting missing values. Notice that the original dataset contains 16,669

churners and 5,592 non-churners and there is as clear class imbalance problem. Roughly 75% of the

original dataset consists of churners. After removing missing values from the sample, the dataset was

trimmed to 21,869 observations. Table 2.5 shows the number of churners and non-churners after

this correction which represents the complete dataset. This table also illustrates that the ratio of

churners to non-churners for the complete dataset is 2.945. In other words, for every non-churner

in the complete dataset there are approximately three churners. It is imperative to recognize this

ratio when partitioning the complete dataset into training and validation (test) sets as described

further below in order to build accurate churn prediction models.

Table 2.6 suggests that the average customer has lived in their present place of residence for

a little more than seven years and has been employed for over six years. The average auto loan term

is for over three years. Moreover, the average customer has a net worth of $17,731.00 with great

spread (variance) in this measure. However, the reliability of this measure is questioned (i.e. high

standard error). The loan to value ratio is 1.5. This means that both the average and median

consumer has taken on substantial debt to procure items of personal value. Interestingly, both the

average and median FICO scores surpass 600, which exceeds the common subprime borrower

range of 501-600 (Huffman, 2018).

Table 2.4 Number of Churners and Non-Churners (Original Dataset)

Churned 16,669

(Non-Churned) 5,592

Total Sample Size 22,261

Churn Rate 75%

Page 64: THESIS/DISSERTATION APPROVED BY - Creighton University

62 | P a g e

Table 2.6 Descriptive Statistics from Complete Dataset Variable N Mean Standard Deviation Median Standard Error Residential months 21,869 85.04 99.14 48 0.67 Employment months 21,869 74.06 85.33 42 0.58 Term of loan 21,869 39.12 12.09 38 0.08 Net worth 21,869 17,731.48 98,520.23 1544.19 666.21 Monthly income 21,869 3,752.89 2,595.80 312 17.55 Coborrower 21,869 0.21 0.41 0 0 Loan to value 21,869 1.48 0.75 1.36 0.01 Assets to age 21,869 1,688.10 2,907.04 2907.04 19.66 FICO 21,869 625.31 77.45 620 0.52

Histograms

A series of histograms is presented for the proxied variables of character and credit from the

completed dataset. (Figures 2.2-Figure 2.6). It is evident that both residential months and

employment months skew to the left and contain outliers. However, the major proxied variable for

credit (i.e. the FICO score) is more normally distributed.

Page 65: THESIS/DISSERTATION APPROVED BY - Creighton University

63 | P a g e

Figure 2.2 Residential Months

Figure 2.3 Employment Months

Page 66: THESIS/DISSERTATION APPROVED BY - Creighton University

64 | P a g e

Figure 2.4 Assets to Age

Figure 2.5 Net Worth

Page 67: THESIS/DISSERTATION APPROVED BY - Creighton University

65 | P a g e

Figure 2.6 Fico Score

Correlations and Variance Inflation Factor Tests

Table 2.7 shows that there are no multicollinearity issues between the various industry

established proxy variables for character. Each of the correlation coefficients contain a coefficient

value less than .80 (Field, 2012). However, there are multicollinearity issues represented with the 4Cs

and the identified controls. In particular, the correlation coefficients of monthly income and

disposable income are highly correlated (i.e. 0.79) as shown in Table 2.8. A calculation of the

variance inflation factor statistic (VIF) for each variable was also conducted and verified these

results as shown in Tables 2.9 and 2.10. Specifically, the VIF test can detect for subtleties in

correlations by comparing each of the predictor values linear relationships with one another (Field,

2012). Field (2012) refers to Myers (1990) who argues that VIF values larger than 10 are problematic

and confer issues of multicollinearity. This is clearly illustrated with both the monthly income and

disposable income variables in the training dataset which both are proxy variables for capacity.

Page 68: THESIS/DISSERTATION APPROVED BY - Creighton University

66 | P a g e

Table 2.7 Correlation Coefficients for Restricted Model

Residential months Employment months Net worth Assets to age Residential months 1 Employment months 0.209 1 Net worth 0.161 0.160 1 Assets to age 0.186 0.197 0.383 1

Table 2.8 Correlation Coefficients for Full Model Residential Months Employment Months Net Worth Assets to Age Loan to Value Term Monthly Income Coborrower Debt to Income Disposable Income FICO

Residential Months 1

Employment Months 0.209 1

Net Worth 0.161 0.16 1

Assets to Age 0.186 0.197 0.383 1

Loan to Value -0.051 -0.011 -0.116 -0.118 1

Term 0.028 0.029 0.036 0.139 0.016 1

Monthly Income 0.082 0.18 0.093 0.403 -0.052 0.121 1

Coborrower 0.059 0.074 0.059 0.244 -0.033 0.08 0.303 1

Debt to Income -0.02 -0.005 -0.125 0.035 0.036 0.078 -0.191 -0.028 1

Disposable Income 0.066 0.147 0.117 0.309 -0.054 0.073 0.794 0.257 -0.365 1

FICO 0.169 0.172 0.15 0.301 -0.018 0.079 0.184 0.145 0.013 0.14 1

Page 69: THESIS/DISSERTATION APPROVED BY - Creighton University

67 | P a g e

Table 2.9 Variance Inflation Factor Scores for Full Model (Complete Dataset) Variables Variance Inflation Factor Score

Residential months 1.276 Employment months 1.387 Net worth 3.127 Assets to age 4.519 Term of loan 1.167 Loan to value 1.170 Coborrower 1.176 Debt to income 1.190 FICO 1.312 Age 1.506 Exposure 1.343 Unemployment rate 1.031 Marital status 1.018 Dependent number 1.052 Monthly income 20.798 Disposable income 16.856

Table 2.10 Variance Inflation Factor Scores for Restricted Model (Complete Dataset) Variables Variance Inflation Factor Score

Residential months 1.239 Employment months 1.341 Net worth 2.148 Assets to Age 2.165 Age 1.390 Exposure 1.033 Unemployment rate 1.020 Marital status 1.009 Dependent number 1.023

In order to test the major hypotheses (H1 and H2) and build the various proposed models, it

is standard practice to both randomize, balance and partition the complete dataset into both training

and validation (test) sets. The balanced training dataset consists of a total of 9,086 observations with

56 different variables. There are a total of 4,543 churners and 4,543 non-churners (i.e. approximately

80% of the 5,543 non-churners) in the training dataset as shown in Table 2.11. Table 2.11 also

Page 70: THESIS/DISSERTATION APPROVED BY - Creighton University

68 | P a g e

shows the number of churners and non-churners in the test set. The complete dataset (Table 2.5)

indicated that the ratio of churners to non-churners was 2.945 (approximately three). Preserving this

original ratio, the test set maintains a total of 3, 945 observations. Essentially, the test set consists of

1000 non-churners (i.e. 20% of 5,543 non-churners) and 2,945 churners.

Table 2.11 Number of Balanced Churners and Non-Churners by Dataset Type Dataset Type Customers (Churned) Customers (Non-Churned) Training Dataset 4543 4543 Test Dataset 2945 1000

2.11 Chi-Square, Logistic Regression and Confusion Matrices

In explicitly testing H1 (that there is a difference between the full model and the restricted

model), ANOVA analysis reveals a difference (deviance) in the models of 223.49 (df=7). The chi-

square test equation reveals a test value of 223.448 (i.e. 224) and that there are 7 degrees of freedom

between the two models which was obtained by employing the equation below:

χ2 =2[LL Restricted Model –LL Full Model] and df =k Restricted Model -k Full Model

Since the test value is 224, which is greater than both the critical values of 14.07 (for p<=0.05) and

18.48 (for p<= 0.01) the result clearly illustrates that there is a difference between both models in

churn prediction. Therefore H1 is accepted as summarized in Table 2.12

Logistic Regression and Confusion Matrices

Table 2.13 reveals the results from performing logistic regression on the full (4C’s) and

restricted (Character) models on the training set. The log-likelihood for the full model

(-6104.059, df=17) is less than the restricted model (-6215.803, df=10) indicating that the restricted

model accounts for more unexplained information (Field, 2012). The deviance value for the full

Page 71: THESIS/DISSERTATION APPROVED BY - Creighton University

69 | P a g e

model is 12208.118 and the restricted model is 12431.606. The Akaike information criterion (AIC)

compares models on the basis of fit for the same outcome variable. The AIC value for the full

model is 12242. The restricted model AIC value is 12452. Therefore, the full model is a better fit

than the restricted model since it contains a lower value (Field, 2012). However, in percentage terms

the worsening in AIC is minimal (i.e. 1.7%) and it is easier to interpret the restricted model than the

full model. In terms of significance, the full model illustrates significance for residential months

(p<0.01), loan to value (p<0.01), term (p<0.01) as well as the FICO score (p<0.01). Additionally, the

exposure (p<0.01), age (p<0.01), marital status (p<0.10) and unemployment variables (p<0.01) also

show significance. Of all the significant variables in the full model, the character proxies show up

once (i.e. residential months) the collateral proxies twice (i.e. loan to value and term), the credit score

once (i.e. FICO score) with no reported measures for the capacity variables. Yet, the control proxies

of exposure, age and unemployment show up most frequently. Additionally, the full model also

suggests that the odds of a customer churning decrease with each additional month of occupancy in

one’s current residence, longer term loan contracts, larger loan to value ratios, being single and with

a rise in the unemployment rate in a particular zip code. However, the odds of a customer churning

increase based on their FICO score, age and exposure figures.

The restricted model only reveals significance for the character variable of residential months

(p<0.1) and the control variables of age (p<0.01) and the unemployment rate (p<0.001). Most

importantly, the odds of a customer churning decrease with each additional month of occupancy in

one’s current residence (staying in the same home) and with a rise in the unemployment rate of the

zip code a consumer resides in. The odds of churning increase with consumer age. Jointly, with

respect to painting a customer profile for both models, in order to better predict non-churn, the

bank should consider a customer who has lived in the same place of residence longer, takes a longer

Page 72: THESIS/DISSERTATION APPROVED BY - Creighton University

70 | P a g e

term loan, reports a higher loan to value ratio, is single, younger in age and lives in an area with a

higher unemployment rate.

Finally, the confusion matrices for the full model and restricted models are shown below on

the validation (test set) at the .50 probability cutoff level in Tables 2.14 and 2.15. This initial

threshold value of .50 was chosen because it is regarded as the default probability measure for

confusion matrix analysis when conducting binary classification (Freeman and Moisen, 2008). Table

2.15 illustrates the results from confusion matrix analysis. For the full model, the accuracy rate is

0.578 which implies that the error rate is 0.422. The recall (sensitivity) measure is 0.584 while the

precision measure is 0.796. Finally, the F1 measure for the full model is 0.674. The restricted model

reveals an accuracy rate of 0.585 and an error rate of 0 .415. The recall (sensitivity) measure is 0.622

while the precision measure is 0.777. F1, for the restricted model is 0.691. In comparing both the

full and restricted models, it is evident that the full model reports a lower accuracy rate (higher error

rate) and a lower sensitivity value. As such, the full model is less able to accurately predict churn as

well as predict it often. However, the full model is more superior in precision (i.e. it predicts churn

((i.e. not non-churn) occurring correctly more frequently).

Table 2.12 Restricted Model vs.

Full

Model

(Testing

of

H

1 )

Model Residual Df Residual.Dev Df Deviance Restricted Model 9076 12432 Full Model 9069 12208 7 223.49

Page 73: THESIS/DISSERTATION APPROVED BY - Creighton University

71 | P a g e

Table 2.13

Logistic Regression Results for Training Set

Dependent Variable

churn

(1) (2)

Residential months

-0.001***

-0.0004*

(0.0002) (0.0002)

Employment months

0.00002

0.0003

(0.0003) (0.0003)

Assets to age

-0.00002

-0.00000

(0.00002) (0.00001)

Net worth

0.00000

0.00000

(0.00000) (0.00000)

Loan to Value

-0.272***

(0.033)

Term of loan

-0.022***

(0.002)

Monthly income

-0.00003

(0.00004)

Coborrower

0.093

(0.057)

Debt to Income

0.001

(0.001)

Disposable Income

0.00005

(0.00004)

FICO

0.001***

(0.0003)

Age

0.005***

0.006***

(0.002) (0.002)

Exposure

0.00003***

-0.00000

(0.00001) (0.00001)

Page 74: THESIS/DISSERTATION APPROVED BY - Creighton University

72 | P a g e

Unemployment rate

-0.071*** -0.076***

(0.007) (0.007)

Marital Status

-0.235* -0.196

(0.133) (0.131)

Dependent number

0.006 0.003

(0.027) (0.027)

Constant

0.762*** 0.466***

(0.245) (0.153)

Observations 9,086 9,086

Log Likelihood -6,104.059 -6,215.803

Akaike Inf. Crit. 12,242.120 12,451.610

Note: *p<0.1; **p<0.05; ***p<0.01

Table 2.14 Logistic Regression Confusion Matrix for Full Model (Test Set)

Actual Prediction

No Churn

No Churn 559

Churn 1225

Churn 441 1720 *at threshold probability value of 0.50

Table 2.15 Logistic Regression Confusion Matrix for Restricted Model (Test Set)

Actual Prediction No Churn

No Churn 475

Churn 1112

Churn 525 1833 *at threshold probability value of 0.50

Page 75: THESIS/DISSERTATION APPROVED BY - Creighton University

73 | P a g e

Table 2.16 Confusion Matrix Measures for Logistic Regression Supervised Learning Method Accuracy Rate Error Rate Precision Recall F1 Logistic Regression Full Model 0.578 0.422 0.796 0.584 0.674 Logistic Regression Restricted Model 0.585 0.415 0.777 0.622 0.691

2.11 LDA and Confusion Matrices:

Tables 2.17 and 2.18 show the results from the LDA analysis from the training set for both

the full model (4Cs) and restricted model (character). Examining the magnitudes of the functional

coefficients alone which serve as the bases for equations (3-6) above, the LDA results from the full

model (Table 2.17) reveal that the higher the net worth of the customer the more likely the customer

will be classified as churning based on this variable alone (all other variables held constant), as age

increases a customer is more likely to be categorized as a churner (all other variables held constant),

the higher one’s FICO score the more likely they are to be classified as a churner (all other variables

held constant), the longer an individual lives in their current residence, the less likely they are to be

placed in the churn group (all other variables held constant) and the more consistent months of

employment that they have had, the more likely they will be put in the churn class etc. LDA analysis

for the full model character variable residential months reports a similar categorization of no churn

as displayed when conducting logistic regression. Much like logistic regression, an increase in age and

the FICO score also contributes towards churn classification (keeping other variables constant)

Table 2.18 for the restricted model shows that as age increases (other variables held

constant) the likelihood of being placed in the churn group increases, as an individual lives longer in

their current residence (other variables held constant), the possibility to churn decreases (more likely

to be categorized as a non-churner), as employment months increase (all other variables held

constant) the likelihood of churning increases (i.e. being placed in the churn group) and as net worth

Page 76: THESIS/DISSERTATION APPROVED BY - Creighton University

74 | P a g e

increases, a customer is most expected to be classified as a churner. It is important to note that

marital status (i.e. being married) reports large coefficient values for both the full and restricted

model and suggests that if someone is married they are less likely to be placed in the churn group

(assuming all other variables are constant). Collectively, if the bank wants to avoid prepayment or

switching behavior, they are looking to profile an individual who has lived in their current residence

for a longer span of time, worked a less amount of time, has lower net worth and FICO score, is

younger in age and married.

Obtaining the maximum and minimum values (observations) based on the LDA fitted

scores for the full model between consumers who are classified as churners and those who are

compartmentalized into the non-churner categories further helps understand the mechanism of the

LDA classifier more clearly as depicted by the equation below. While the LDA function syntax

provides the functional coefficients of each of the predictor variables (i.e. refer to Table 2.17) the

LDA scores syntax provides a fitted score for each observation of the dataset and the two class

types (i.e. churn and no-churn).

LDA Fitted Score Difference = LDAFittedscores for Churn Class – LDAFittedscores for Non-Churn Class

The analysis shows that the maximum difference between churners and non-churner scores

was observed with observation 4 (i.e. 1.946182) and the minimum with observation 2,920

(i.e. -3.091956) of the training set. Based on these findings, customer 4 would be classified as a

churner and customer 2,920 as a non-churner. Additionally, by knowing each of the predictor

variable details for observation 4 as shown in Table 2.19 it is possible to fit these values to the LDA

full model and obtain a yes churn score of 37.8 and a no-churn score of 35.8. Thus, observation 4 is

indeed classified as a churner which is also obtained by calculating the maximum of the LDA Fitted

Score Difference as described above. The same approach can be applied to the restricted model.

Page 77: THESIS/DISSERTATION APPROVED BY - Creighton University

75 | P a g e

Extending this analysis, Tables 2.20 and Table 2.21 illustrate the confusion matrices for both

the full and restricted models on the test set. The accuracy and error rates for the full model are

0.578 and 0.422. The recall (sensitivity) measure is 0.585 while the precision measure is 0.796.

Finally, the F1 measure for the full model is 0.674. For the restricted model, the accuracy rate is

0.585 and the error rate is 0.415. The recall (sensitivity) measure is 0.622 while the precision measure

is 0.777. F1, for the restricted model is 0.691. In comparing both the models, it is evident that the

restricted model reports a better accuracy rate (lower error rate), sensitivity (predicts actual churn

transpiring most often) and consequently has a higher F1 measure. It should be noted that full

model reports a higher precision measure. This means it reports churn transpiring correctly more

frequently. For the full model, this means that it predicts churn transpiring 79.6% of the time

correctly. Table 2.22 summarizes the confusion matrix measures for both LDA models.

Table 2.17 LDA Functions (Full Model) on Training Set no yes constant -65.18178579 -64.44793745 Net worth 0.00000417 0.00000460 Employment months -0.01618987 -0.01616672 Residential months -0.01578385 -0.01643476 Assets to age -0.00085295 -0.00086995 Age 0.17644586 0.18135669 Exposure -0.00041280 -0.00038223 Marital status 36.02943529 35.79255199 Dependent number 1.59157170 1.59804896 Loan to value 3.72494932 3.45732283 Term of loan 0.25099681 0.22879825 Unemployment rate 0.73574717 0.66836854 Monthly Income -0.00008586 -0.00011254 Coborrower -2.75981188 -2.66441456 Debt to income 0.03453945 0.03537227 Disposable income 0.00045194 0.00049612 FICO 0.11161923 0.11308888

Page 78: THESIS/DISSERTATION APPROVED BY - Creighton University

76 | P a g e

Table 2.18 LDA Functions (Restricted Model) on Training Set no yes constant -26.21329270 -25.75181275 Residential months -0.00269112 -0.00314409 Employment months -0.00614195 -0.00586904 Net worth -0.00001011 -0.00000964 Assets to age 0.00032183 0.00031302 Age 0.25652232 0.26183407 Exposure 0.00016247 0.00016110 Marital status 37.75371120 37.53159175 Dependent number 0.92484994 0.91915613 Coborrower -0.87267715 -0.72914424 Unemployment rate 0.54514069 0.47355570

Table 2.19 Maximum LDA Fitted Score Difference on Training Set Residential months 96 Employment months 120 Assets to Age 42797.24 Net worth 1146608 Loan to Value 1.0852 Term 61 Monthly income 44450 Coborrower 1 Debt to Income 27.477 Disposable Income 33321.88 FICO 755 Age 47 Exposure 61850 Unemployment rate 3.8 Marital status 1 Dependent number 1

Table 2.20 Linear Discriminant Analysis Confusion Matrix for Full Model (Test Set)

Actual Prediction No Churn

No Churn 555

Churn 1223

Churn 444 1722

Page 79: THESIS/DISSERTATION APPROVED BY - Creighton University

77 | P a g e

Table 2.21 Linear Discriminant Analysis Confusion Matrix for Restricted Model (Test Set)

Actual Prediction No Churn

No Churn 474

Churn 1113

Churn 526 1832

Table 2.22 Confusion Matrix Measures for Linear Discriminant Analysis Supervised Learning Method Accuracy Rate Error Rate Precision Recall F1 Linear Discriminant Analysis Full Model 0.577 0.423 0.795 0.585 0.674 Linear Discriminant Analysis Restricted Model 0.585 0.415 0.777 0.622 0.691

2.11 Decision Trees and Confusion Matrices

Figure 2.7 shows the classification tree for the full model after pruning the tree using cross

validation to avoid overfitting (Kuhn and Johnson, 2013). It is evident that the key variables in the

full model analysis consist of the unemployment rate (control), loan to value (collateral), FICO score

(credit) and the term of the loan (collateral) as shown below based on the criterion established with

each of these variables. In other words, the classifier has identified five potential questions along

each of these variables and specific criteria as defined below to aid in the classification of an

unknown customer. In entirety, the full model decision tree predicts that if a customer lives in a

place of higher unemployment (i.e. greater than 8.6%), has a loan to value ratio greater than one, has

a credit score less than 577 and takes on a loan agreement that exceeds fifty months they are more

likely to be classified as non-churners. As a result, it appears that the collateral proxies (i.e. term and

loan to value) and credit proxy (FICO) have greater influence in terms of decision tree classification

and none of the character variables factor into the decision making process in the full model. For

the purposes of target marketing, the bank should focus on areas of high unemployment, where

Page 80: THESIS/DISSERTATION APPROVED BY - Creighton University

78 | P a g e

consumers take on more debt than the value of their assets, where credit scores are lower among

consumers and where consumers need to sign longer term loans in order to prevent customer churn.

Similarly, the classification tree is also generated for the restricted model after pruning.

Figure 2.8 reveals that the key variables of interest in the restricted model are the unemployment rate

(control) exposure (control) and net worth (character). Therefore, only the character variable of net

worth was selected for the purposes of splitting at a cut-off point of $2,150 dollars. However, it

appears that net worth would factor as least influential in the questioning process that determines

final customer classification.

Tables 2.23 and 2.24 illustrate each of the confusion matrix measures. For the full model, the

accuracy rate is 0.563 which implies that the error rate is 0.437. The recall (sensitivity) measure is

0.549 and the precision measure is 0.804. Finally, the F1 measure for the full model is 0.652. The

restricted model reveals an accuracy rate of 0.578 and an error rate of 0 .422. Recall is 0.599 while

the precision measure is 0.786. F1, for the restricted model is 0.680. In comparing, both the full and

restricted models, it is evident that the restricted model reports a higher accuracy (i.e. lower error

rate), sensitivity value and F1 score. However, the full model is more superior in precision (predicts

churn transpiring more frequently correctly) as illustrated in Table 2.25. Therefore, the restricted

model’s superiority in accuracy, error, sensitivity and F1 scores do support the logistic regression and

linear discriminant analysis findings above.

Page 81: THESIS/DISSERTATION APPROVED BY - Creighton University

79 | P a g e

Figure 2.7 Classification Tree for the Full Model (4Cs)

Each node of the decision tree illustrates the majority rule (Yes=churn No= No churn). The yes majority is signified in green and the no majority in blue. The decision rule sends each of the root and internal nodes with a Yes decision to the left branch and a no decision to the right branch. The terminal leaf nodes are shown at the very bottom.

Page 82: THESIS/DISSERTATION APPROVED BY - Creighton University

80 | P a g e

Figure 2.8. Classification Tree for Restricted Model (Character)

Each node of the decision tree illustrates the majority rule (Yes=churn No= No churn). The yes majority is signified in green and the no majority in blue. The decision rule sends each of the root and internal nodes with a Yes decision to the left branch and a no decision to the right branch. The terminal leaf nodes are shown at the very bottom.

Table 2.23 Decision Tree Confusion Matrix for Full Model (Test Set)

Prediction

No Churn Churn

Actual No Churn

606 394

Churn 1328 1617

Table 2.24 Decision Tree Confusion Matrix for Restricted Model (Test Set)

Prediction

No Churn Churn

Actual No Churn

519 481

Churn 1182 1763

Page 83: THESIS/DISSERTATION APPROVED BY - Creighton University

81 | P a g e

Table 2.25 Confusion Matrix Measures for Decision Trees Supervised Learning Method Accuracy Rate Error Rate Precision Recall F1 Decision Tree Full Model 0.563 0.437 0.804 0.549 0.652 Decision Tree Restricted Model 0.578 0.422 0.786 0.599 0.680

2.11 Random Forests and Confusion Matrices

Random forest models were generated for both the full (4Cs) and restricted (character)

models. Each forest consisted of 500 trees (i.e. default value) with eight random variables at each

split of an individual tree through a process of bootstrap aggregation. Adhering to conventional

rules, eight variables were chosen for splitting as it represents the square root of the total number of

predictors in the full model (James et al., 2013). Additionally, eight variables were also retained for

the restricted model as the selection of a smaller number of variables contributes to overfitting

(James et al., 2013). Table 2.26 displays the mean decrease Gini value for each variable of the model

while Figure 2.9 visually captures the variable importance plot. The mean decrease Gini value is a

measure of variable importance for determining the outcome variable. Essentially, a higher mean

decrease Gini value is indicative of the most important variables to the model. From top to bottom

(i.e. most important to least important), it is apparent that the loan to value (i.e. collateral variable),

FICO credit score (i.e. credit), unemployment rate (i.e. control variable) and net worth (character variable) are

among the top four most important variables that impact the full random forest model. Of all the 4Cs, the

character variables do not appear as critical to the churn screening process. The full random forest model reports

an out of bag error (OOB) estimate of 41.04% which is an equivalent term to the misclassification rate. This

figure indicates that 58.96% of the OOB samples were correctly classified by the full model random forest.

When the number of variables tried at each split is seven, the OOB estimate drops to 41.03%. Similarly, when

the number of variables tried at each split is increases to nine, the OOB estimate is 41.0%. Thus, there is minimal

Page 84: THESIS/DISSERTATION APPROVED BY - Creighton University

82 | P a g e

difference in the OOB estimates whether 7, 8 or 9 variables are considered for each tree splitting.

Table 2.27 illustrates the mean decrease Gini value for each variable of the restricted model

while Figure 2.11 shows the variable importance plot. In order of importance, the character variables

of net worth, assets to age, employment months and residential months are determined to be the

most critical of the character variables. Therefore, in the process of character screening, this analysis

reveals that bankers who lend loans to subprime consumers must carefully assess the individual

customer characteristics of net worth, followed by their assets to age ratio, the number of months

they have been employed and have resided in their residential dwelling. Of the control variables

explicitly, exposure, the unemployment rate, age and dependent count are the most influential

variables. The OOB estimate for the restricted model is 45.21%. Again, this figure indicates that

54.79 % of the OOB samples were correctly classified by the full model random forest. Again, fine

tuning the random forest character model at seven and nine variables results in OOB estimates of

45.23% and 45.18%. The OOB estimate for four variables considered at each split is 45.5%.and five

variables is 45.0%. For six variables the OOB estimate is 44.8% Therefore, considering six variables

at each split results in the lowest OOB estimate.

Page 85: THESIS/DISSERTATION APPROVED BY - Creighton University

83 | P a g e

Table 2.26 Random Forest Mean Decrease Gini Values (Full Model) on Training Set Predictor Variables Mean Decrease Gini Value Residential months 296.07158 Employment months 295.69732 Assets to age 353.26721 Net worth 370.45416 Loan to Value 424.80057 Term of loan 285.98579 Monthly Income 312.26361 Coborrower 32.84926 Debt to Income 366.21672 Disposable Income 312.07062 FICO 404.92616 Age 290.75531 Exposure 346.87356 Unemployment rate 380.96567 Marital status 15.05951 Dependent number 54.29179

Table 2.27 Random Forest Mean Decrease Gini Values (Restricted Model) on Training Set

Predictor Variables Mean Decrease Gini Residential months 536.01113 Employment months 550.10248 Assets to age 733.94139 Net worth 742.85347 Age 539.0949 Exposure 660.44052 Unemployment rate 595.44523 Marital status 26.45672 Dependent number 92.62189 Coborrower 64.87512

Page 86: THESIS/DISSERTATION APPROVED BY - Creighton University

84 | P a g e

Figure 2.9 Variable Importance Plot of the Random Forest (Full Model)

Figure 2.10 Variable Importance Plot of Random Forest (Restricted Model)

Page 87: THESIS/DISSERTATION APPROVED BY - Creighton University

85 | P a g e

Table 2.28 displays the confusion matrix for the full model while Table 2.29 illustrates the

confusion matrix for the restricted model. The accuracy measure for the full model is 0.597.

The full model reveals that the error rate is 0.403 while the recall measurement is 0.603 and the

precision score is 0.805. The F1 score is 0.692. Similar measures are obtained for the restricted

model. The restricted model accuracy rate is 0.575 while the error rate is 0.403. The sensitivity

measure is 0.592 while the precision measure is 0.786 and the F1 score is 0.675. Unlike the other

classification methods employed above the full model reports a higher accuracy rate

(lower error rate), higher sensitivity measure, higher precision figure as well as F1 score.

The full model simply predicts churn occurring more often more correctly than the restricted model

when utilizing this classification method.

Table 2.28 Random Forest Confusion Matrix for Full Model (Test Set)

Actual

Prediction No Churn

No Churn 566

Churn 1156

Churn 434 1789

Table 2.29 Random Forest Confusion Matrix for Restricted Model (Test Set)

Actual Prediction No Churn

No Churn 525

Churn 1202

Churn 475 1743

Table 2.30 Confusion Matrix Measures for Random Forests Supervised Learning Method Accuracy Rate Error Rate Precision Recall F1 Random Forest Full Model 0.597 0.403 0.805 0.607 0.692 Random Forest Restricted Model 0.575 0.425 0.786 0.592 0.675

Page 88: THESIS/DISSERTATION APPROVED BY - Creighton University

86 | P a g e

Finally, in examining the confusion matrix results for each of the classification methods

collectively (Table 2.31), the restricted models for logistic regression, linear discriminant analysis and

decision trees each report higher accuracy rates (lower error rates), larger sensitivity measures and

F1 scores than their full models. This suggests that for these classification methods, the restricted

character model is able to predict churn effectively and concentrating on the character and control

variables provides a strong basis for predictive churn modeling. Yet, the full models to these

classification methods perform superior to their restricted counterparts on precision measures. This

finding implies that these full models are able to predict churn occurring more frequently correctly.

The random forest full model illustrates superior performance on all aspects of the confusion matrix

measures (i.e. accuracy, error, precision, recall and F1 score). Therefore, on the basis of confusion

matrix analysis, H2 is supported. More specifically, the decision tree full model reports the lowest

accuracy (highest error). Excitingly, the random forest full model followed by the decision tree full

model report the largest precision values. Therefore, the tree based classification methods perform

well on this predictive performance measure. Interestingly, we observe predictive compatibility

between the logistic regression and linear discriminant analysis measures of accuracy and sensitivity

which confirms previous findings that have shown similarities in predictive performance between

both these classification methods (Pohar et al., 2004). Finally, the F1 score is highest for the random

forest full model followed by the logistic regression restricted model. The strong performance of

logistic regression on the basis of accuracy, sensitivity and the F1 measure reaffirms earlier findings

that have praised the predictive power and easy comprehensibility of logistic regression (Verbeke et

al., 2011; Risselada et al., 2010;Martens et al., 2011) even against more sophisticated data mining

techniques (Coussement et al., 2017).

Table 2.32 reports the AUC values for each of the classification methods. The full

models report higher AUC values than the restricted models. However, the AUC value is highest for

Page 89: THESIS/DISSERTATION APPROVED BY - Creighton University

87 | P a g e

the random forest (0.632), followed by the logistic regression (0.606), linear discriminant

analysis (0.606) and decision tree (0.594) full models. Thus, the random forest classifier has a 63.2%

chance of discriminating between positive class (churner) and negative class (non-churner).

Comparatively speaking, even for the restricted models the AUC measures follow the same

hierarchy in terms of predictive performance. Interestingly, there is a three way tie with respect to

AUC scores (logistic regression, liner discriminant analysis and decision tree) for the restricted

models. Therefore, H2 is supported when AUC is used as the performance measure. However, it is

important to note that the AUC values are not spectacular and depending on the classification

method utilized only marginally perform superior to a random classifier that has not been trained on

a test set. Figures 2.11-2.14 display the ROC curves and AUC values for both the full and restricted

models for each of the classification methods.

Table 2.31 Confusion Matrix Comparison Metrics for Various Models

Classification Method Accuracy Rate Error Rate Precision Recall F1 Logistic Regression Full Model 0.579 0.421 0.797 0.585 0.675 Logistic Regression Restricted Model 0.585 0.415 0.777 0.622 0.691 Linear Discriminant Analysis Full Model 0.578 0.422 0.796 0.585 0.674 Linear Discriminant Analysis Restricted Model 0.585 0.415 0.777 0.622 0.691 Decision Tree Full Model 0.563 0.437 0.804 0.549 0.652 Decision Tree Restricted Model 0.578 0.422 0.786 0.599 0.68 Random Forest Full Model 0.597 0.403 0.805 0.607 0.692 Random Forest Restricted Model 0.575 0.425 0.786 0.592 0.675

Page 90: THESIS/DISSERTATION APPROVED BY - Creighton University

88 | P a g e

Table 2.32 Model Comparison with AUC Metrics

Classification Method AUC Logistic Regression Full Model 0.606 Logistic Regression Restricted Model 0.565 Linear Discriminant Analysis Full Model 0.606 Linear Discriminant Analysis Restricted Model 0.565 Decision Tree Full Model 0.594 Decision Tree Restricted Model 0.565 Random Forest Full Model 0.632 Random Forest Restricted Model 0.583

Page 91: THESIS/DISSERTATION APPROVED BY - Creighton University

89 | P a g e

2.11 ROC Curves and AUC Values for Various Supervised Learning Methods

Figure 2.11 ROC Curves for Logistic Regression

Figure 2.12 ROC Curves for Linear Discriminant Analysis Models

Page 92: THESIS/DISSERTATION APPROVED BY - Creighton University

90 | P a g e

Figure 2.13 ROC Curves for Decision Tree Models

Figure 2.14 ROC Curves for Random Forest Models

Page 93: THESIS/DISSERTATION APPROVED BY - Creighton University

91 | P a g e

2.12 Discussion

To recap, this study essentially had two primary goals. Firstly, this paper intends to

better understand the role of character variables in customer auto loan churn modeling to subprime

lenders. Secondly, the study seeks to evaluate and compare the predictive performance of the

various classification methods of logistic regression, linear discriminant analysis, decision trees and

random forests. Based on the findings reported above, a series of implications are drawn.

With respect to the first goal, the findings of the study suggest that assessing the role of

character variables is complex and that their influences vary according to the classification methods

employed. The findings from logistic regression show that the character variable of residential

months was most important to auto loan prediction while both the decision tree and random forest

methods highlight the explanatory power of consumer net worth as most important to analysis.

These differences make sense given the fact that each classification method contains its unique set

of assumptions and has its own inherent advantages and limitations as discussed in the paper.

Therefore, collectively no unanimous conclusions can be drawn about which character explanatory

variables are most critical to auto loan churn for all the methods employed in totality.

Yet, the findings of this study do shed some additional light on the customer profile the

bank should be seeking in order to avoid churn based on the classification methods employed. For

example, the results from logistic regression indicate that the bank should consider a customer who

has lived in the same place of residence longer, qualifies for a longer term loan, reports a higher loan

to value ratio, is single, younger in age and lives in an area with a higher unemployment rate. Similar,

profiles were developed using linear discriminant analysis and decision tree analysis. Hence, further

customer profiling progress based on the 4Cs will support customer value analysis efforts (Chen et

al., 2015).

The findings of the paper suggest that the random forest classification method provides the

Page 94: THESIS/DISSERTATION APPROVED BY - Creighton University

92 | P a g e

most detail on the behaviors of these character variables. Hence, the credit union may find value in

adopting the pecking order of net worth, assets to age, employment months and residential months

to their subprime lending screening practices. Interestingly, of the 4Cs, none of the capacity

variables generated results. This is rather surprising given the fact that one’s ability to

repay a loan is assessed as critical to lending practices (Kiisel, 2013). Yet, it is possible that capacity is

being strongly assessed by credit despite their being no multicollinearity between the FICO score

(credit proxy) and any of the capacity variables.

Nonetheless, it is worth noting the influential role of the unemployment rate (i.e. non-entity

variable) in the analysis. Both logistic regression and decision tree analysis allude to the point that

the higher the unemployment rate in the neighborhood in which the customer resides, the less

likely they are to churn. Again this is in line with Agarwal et al. (2008) who report greater default

behavior when there is a rise in the unemployment rate. Both this paper and Agarwal et al’s (2008)

conclusion stimulates a further question: “Does the lack of prepayment or consumer switching

behavior nudge consumer default?” Moreover, as this credit union seeks expansion possibilities, the

analysis suggests that they should consider markets with higher unemployment figures in order to

retain customer loyalty. This credit union has and should continue to focus on smaller rural

markets where unemployment poses challenges.

Similarly, both logistic regression and decision tree analysis show that higher customer

credit scores suggest more prepayment or switching (churning) as demonstrated by Agarwal et al.

(2008) and Heitfield and Sabarwal (2004). The rise in the LTV ratio as shown in the logistic

regression and decision tree analysis also affirms Agarwal et al. (2008) who found that an increase in

the loan to value ratio leads to non-churning. Regardless, both decision tree analysis and logistic

regression results do contradict the finding of Heitfield and Sabarwal (2004) who found that

prepayment rates increase with loan age. In this analysis, the opposite is reported. The finding from

Page 95: THESIS/DISSERTATION APPROVED BY - Creighton University

93 | P a g e

this paper appears to be more intuitively logical though. Yet, the differences in results maybe

partially explained by the fact that both these studies differed in the variables that were factored into

the analysis (i.e. treasury note rate, state fixed effects, marital status, coborrower existence etc.)

methods used and the time periods the studies were conducted.

The initial findings of the paper certainly raise ethical and moral concerns from a customer

profiling and business predatory practice perspective. While the results suggest targeting customers

who reside in areas of higher unemployment, possess larger loan to value ratios, hold lower credit

scores, have less net worth, are single, younger in age and qualify for longer term loans, the analysis

does not advocate taking unfair advantage of the financial vulnerability of this subprime segment

base. At the core, the credit union must remain dedicated to its mission of providing financial

opportunity to those who have limited financial recourse even at the cost of customer churn.

However, the credit union may gain a competitive advantage from other subprime lending

competitors by developing programs that accelerate the path to prepayment or prime rate

qualification. Indeed, it maybe eventually worthwhile to consider offering prime rate loans as part of

its lending portfolio with the intent to retain their initial subprime consumer base. In doing so, both

customers and the credit union will be better served financially as well as ethically. From a marketing

strategy standpoint, the credit union may implement proactive marketing campaigns emphasizing

the path to prepayment and prime rate qualification.

Predictively speaking, the findings of this chapter reinforce some key points found in extant

literature. The study affirms the fact that no one measure of predictive performance can truly be the

basis for all aspects of comparison (Hand 2006; Abu-Nimeh et al., 2007). For the purposes of this

analysis, I concentrated on the traditional measures of accuracy (error rates) as I corrected for the

class imbalance issue (Chawla et al., 2009) but still included other confusion matrix measures for the

purposes of comparative analysis. The analysis confirms the predictive similarities in accuracy rates

from confusion matrix analysis between logistic regression and linear discriminant analysis (Pohar et

Page 96: THESIS/DISSERTATION APPROVED BY - Creighton University

94 | P a g e

al., 2004). Moreover, the superior performance of the logistic regression, linear discriminant analysis

and decision tree restricted models suggest that for these classification methods, the restricted

(character) model is able to predict churn effectively and concentrating on the character and control

variables provides a strong basis for predictive churn modeling. Similarly, the findings of this study

reveal that the tree based classification methods perform superiorly on the performance measure of

precision. Therefore, the tree based classification methods perform well on this predictive

performance measure. While multiple classification methods are employed for the purposes of

evaluating predictive performance, the superior performance of the random forest classifier on the

basis of accuracy rates and AUC values over the other chosen methods is also in line with extant

literature (Abu-Nimeh et al., 2007, Wu et al., 2003; Barboza, 2017). This is likely due to its

non-parametric assumptions, computational power, outlier leniency and avoidance of overfitting. In

sum, this study was able to reinforce these findings given a unique customer base in an industry

where machine learning algorithms have not been academically presented.

2.13 Limitations and Future Research

This study is subject to a set of limitations. Firstly, there is a focus on only one firm within the lending

industry whose presence is limited to the Midwest. Additionally, the firm is a credit union and operates with the

distinct mission of providing services to a vulnerable set of consumers (subprime borrowers). Therefore, the auto

loan prediction models derived are only specific to this consumer segment and are not externally applicable to other

segments. Secondly, it is also important to recognize that there may be other proxy variables to model for each of the

4C’s and customer character that have not yet been identified. For example, the study by Agarwal et al. (2008)

factored a host of consumption characteristics (i.e. type and model of car purchased, year of vehicle being financed,

loan amount etc.) that are not fully accounted for in this study. As a result, this creates omitted variable bias (Clarke,

2005) and means this study is limited in the number of variables used. The inclusion of more consumption

characteristics and other variables may also improve the predictive accuracy of the various classifiers. Thirdly, while

Page 97: THESIS/DISSERTATION APPROVED BY - Creighton University

95 | P a g e

multiple confusion matrix measures are used in the analysis, certain confusion matrix measures like Matthew’s

Correlation Coefficient or Cohen’s Kappa are neglected in this analysis. These measures are less commonly used but

attractive as they factor all four measures within the confusion matrix (i.e. True Positive, True Negative, False Positive

and False Negative) and work well when there is a class imbalance problem (Chen et al., 2015). The extension of this

paper may benefit from the inclusion of these two measures as well and the recognition that this analysis provides

equal importance to False Negative and False Positive classifications. Therefore, future research may apply cost

sensitive measures that penalize for False Positive and False Negative misclassifications (Abu-Nimeh et al., 2007

Hand, 2006). Related to confusion matrix analysis, the confusion matrix for both Logistic Regression models are

based on a threshold probability of 0.50. As this dissertation is extended, it may be wise to evaluate multiple

confusion matrices at different threshold values in order to determine the effect these changing threshold values have

on various performance indicators. Fifthly, as a future study it is recommended that comparisons between both

prepayment and default behavior be examined as those made by Agarwal et al. (2008) and Heitfield and Sabarwal

(2004) with an updated dataset. Sixthly, to more explicitly test the impact of each of the character variables, the study

could benefit from the logistic regression equivalent to principal component analysis (i.e. relative weight analysis,

dominance analysis). Subsequently, fine tuning the random forest models could also be beneficial in the form of

adjusting the number of variables that are involved at each split of a tree and the number of trees factored into model

building to determine the optimal (lowest) OBB error.

Future research may warrant employing other traditional as well as black box supervised techniques (i.e.

Naive Bayesian, K-Nearest Neighbors, Support Vector Machine, Neural Networks, Markov Chains, Ensemble

Learning etc.) as well as different decision tree algorithms (CHAID, C4.5, C5.0 etc.) as a source of comparison to the

CART decision tree algorithm used in this analysis. Likewise, logistic regression analysis could be extended to

incorporate the different forms of stepwise regression (i.e. forward, backward and the combination of both).

Additionally, although both confusion analysis as well as ROC curves/AUC values were used to evaluate the

classification models, lift curves and calibration plots could also be included for future analysis (Vuk and Curk, 2006).

Page 98: THESIS/DISSERTATION APPROVED BY - Creighton University

96 | P a g e

Finally, the models are built according to the data available from 2014-2016. Eventually, the staying power of the

models generated will have to be evaluated (Neslin et al., 2006).

2.14 Contribution

The presented chapter extends existing literature by empirically examining the combined

impact of the 4 Cs to customer churn modeling by employing a series of entity and non-entity

variables. Moreover, this chapter explores the impact of the customer character model comprising of the

specific entity variables of residential months, employment months, assets to age and net worth as

these variables are believed to be very important to customer screening analysis among subprime

lenders (Credit Guru Inc., 2019). The analysis has shown that there is a clear distinction between the

full model (4Cs) and the restricted model (customer character). While the role of customer character

variables is non-uniform across methods, some preliminary information with respect to customer

profiling for the purposes of churn prevention target marketing is obtained. This has been

advocated by Keramati et al. (2016) and Heitfield and Sabarwal, (2004). Thus, this paper guides

future research towards screening these character variables more precisely with the intention to aid

with customer retention efforts and customer value analysis efforts (Chen et al., 2015) for this

unique population. Yet, both the suggested customer retention and customer value analysis efforts

are only recommended if they adhere to societal moral and ethical standards.

Finally, this study combines multiple classification techniques to aid in more efficient churn

prediction as advocated in the churn literature (Verbeke et al., 2011). This entailed utilizing both

traditional and black box supervised learning methods and a variety of different performance

indicators. The superior performance of the logistic regression, linear discriminant analysis and

decision tree restricted models suggest that for these classification methods, the restricted

(character) model is able to predict churn effectively and concentrating on the character and

control variables provides a strong basis for predictive churn modeling for this financial institution.

Page 99: THESIS/DISSERTATION APPROVED BY - Creighton University

97 | P a g e

Additionally, the superior performance of the random forest model as a predictive basis for

predicting churn effectively also supports existing research. Hence in conducting this study, the

described analysis is among the first to apply multiple machine learning classification algorithms to

a unique scenario and customer segment. This has not been empirically reported in the academic

literature to date.

Page 100: THESIS/DISSERTATION APPROVED BY - Creighton University

98 | P a g e

References

Abu-Nimeh, S., Nappa, D., Wang, X., & Nair, S. (2007, October). A comparison of machine learning techniques for phishing detection. In Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit (pp. 60-69). ACM.

Agarwal, S., Ambrose, B. W., & Chomsisengphet, S. (2008). Determinants of automobile loan default and prepayment. Determinants of Automobile Loan Default and Prepayment. Economic Perspectives, 32(3), 17-27

Ali, Ö. G., & Arıtürk, U. (2014). Dynamic churn prediction framework with more effective use of rare event data: The case of private banking. Expert Systems with Applications, 41(17), 7889-7903.

Alao, D. A. B. A., & Adeyemo, A. B. (2013). Analyzing employee attrition using decision tree algorithms. Computing, Information Systems, Development Informatics and Allied Research Journal, 4, 18-28

Amin, A., Al-Obeidat, F., Shah, B., Adnan, A., Loo, J., & Anwar, S. (2018). Customer churn prediction in telecommunication industry using data certainty. Journal of Business Research, 1-12

Anil Kumar, D., & Ravi, V. (2008). Predicting credit card customer churn in banks using data mining. International Journal of Data Analysis Techniques and Strategies, 1(1), 4-28

Ascarza, E., Netzer, O., & Hardie, B. G. (2018). Some customers would rather leave without saying goodbye. Marketing Science, 37(1), 54-77.

Ascarza, E., Iyengar, R., & Schleicher, M. (2016). The perils of proactive churn prevention using plan recommendations: Evidence from a field experiment. Journal of Marketing Research, 53(1), 46-60.

Ascarza, E., & Hardie, B. G. (2013). A joint model of usage and churn in contractual settings. Marketing Science, 32(4), 570-590.

Attri, T. (2018). Why an Employee Leaves: Predicting using Data Mining Techniques. (Master’s dissertation, National College of Ireland).

Balachander, S., & Ghosh, B. (2013). Bayesian estimation of a simultaneous probit model using error augmentation: An application to multi-buying and churning behavior. Quantitative Marketing and Economics, 11(4), 437-458.

Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83, 405-417.

Becker, G. (1964). Human Capital. New York: National Bureau of Economic Research.

Bhardwaj, S., & Bhattacharjee, K. (2010). Modeling money attitudes to predict loan default. IUP Journal of Bank Management, 9(1/2), 12.

Page 101: THESIS/DISSERTATION APPROVED BY - Creighton University

99 | P a g e

Braun, M., & Schweidel, D. A. (2011). Modeling customer lifetimes with multiple causes of churn. Marketing Science, 30(5), 881-902.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Buckinx, W., & Van den Poel, D. (2005). Customer base analysis: partial defection of behaviorally loyal clients in a non-contractual FMCG retail setting. European Journal of Operational Research, 164(1), 252-268.

Burez, J., & Van den Poel, D. (2007). CRM at a pay-tv company: Using analytical models to reduce customer attrition by targeted marketing for subscription services. Expert Systems with Applications, 32(2), 277-288.

Chaudhary, S., & Chaudhari, S. (2015). Relationship between psychological capital, job satisfaction and turnover intention of bank employees. Indian Journal of Health and Wellbeing, 6(8), 816-819.

Chawla, N. V. (2009). Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook (pp. 875-886). Springer, Boston, MA.

Chen, K., Hu, Y., & Hsieh, Y. (2015). Predicting customer churn from valuable B2B customers in the logistics industry: A case study. Information Systems and E-Business Management, 13(3), 475-494.

Chen, Z. Y., Fan, Z. P., & Sun, M. (2012). A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. European Journal of operational research, 223(2), 461-472.

Chen, P. P. S. (1976). The entity-relationship model—toward a unified view of data. ACM Transactions on Database Systems (TODS), 1(1), 9-36.

Chiang, D. A., Wang, Y. F., Lee, S. L., & Lin, C. J. (2003). Goal-oriented sequential pattern for network banking churn analysis. Expert Systems with Applications, 25(3), 293-302

Clarke, K. A. (2005). The phantom menace: Omitted variable bias in econometric research. Conflict management and peace science, 22(4), 341-352.

Clemente-Císcar, M., San Matías, S., & Giner-Bosch, V. (2014). A methodology based on profitability criteria for defining the partial defection of customers in non-contractual settings. European Journal of Operational Research, 239(1), 276-285.

Courchane, M. J., Surette, B. J., & Zorn, P. M. (2004). Subprime borrowers: Mortgage transitions and outcomes. The Journal of Real Estate Finance and Economics, 29(4), 365-392

Coussement, K., & Van den Poel, D. (2008). Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques. Expert Systems with Applications, 34(1), 313-327

Page 102: THESIS/DISSERTATION APPROVED BY - Creighton University

100 | P a g e

Coussement, K., & De Bock, K.W. (2013). Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning. Journal of Business Research, 66(9), 1629-1636.

Coussement, K., Lessmann, S., & Verstraeten, G. (2017). A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry. Decision Support Systems, 95, 27-36

Cotton, J. L., & Tuttle, J. M. (1986). Employee turnover: A meta-analysis and review with implications for research. Academy of Management Review, 11(1), 55-70.

Cox Automotive 2018 Used Car Market Report & Outlook Forecast Higher Used-Vehicle Sales for 2018 and a Decline in New-Car Sales (2018). Retrieved from: https://www.coxautoinc.com/news/cox-automotive-2018-used-car-market-report-outlook- forecast-higher-used-vehicle-sales-for-2018-and-a-decline-in-new-car-sales/

Credit Guru Inc. (2019). What are the 4Cs of Credit? Retrieved from https://www.creditguru.com/index.php/credit-management/commercial-credit-management- articles/79-what-are-the-4-cs-of-credit

Dawson, A. J., Stasa, H., Roche, M. A., Homer, C. S., & Duffield, C. (2014). Nursing churn and turnover in Australian hospitals: nurses perceptions and suggestions for supportive strategies. BMC Nursing, 13(1), 11.

De Bock, K. W., & Van den Poel, D. (2012). Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models. Expert Systems With Applications, 39(8), 6816-6826.

De Caigny, A., Coussement, K., & De Bock, K. W. (2018). A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees. European Journal of Operational Research, 269(2), 760-772

Delen, D. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49(4), 498-506.

Denison, D. G., Mallick, B. K., & Smith, A. F. (1998). A bayesian CART algorithm. Biometrika, 85(2), 363-377.

Duda, J., & Žůrková, L. (2013). Costs of employee turnover. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, 61(7), 2071-2075.

Elliott, K. M., & Shin, D. (2002). Student satisfaction: An alternative approach to assessing this important concept. Journal of Higher Education Policy and Management, 24(2), 197-209.

Fader, P. S., & Hardie, B. G. (2010). Customer-base valuation in a contractual setting: The perils of ignoring heterogeneity. Marketing Science, 29(1), 85-93.

Farquad, M. A. H., Ravi, V., & Raju, S. B. (2014). Churn prediction using comprehensible support vector machine: An analytical CRM application. Applied Soft Computing, 19, 31-40.

Page 103: THESIS/DISSERTATION APPROVED BY - Creighton University

101 | P a g e

Federal Reserve Bank of New York. (2016). Quarterly report on household debt and credit. Retrieved from: https://www.newyorkfed.org/medialibrary/interactives/householdcredit/data/pdf/HHDC_2017Q 2.pdf

Field, A., Miles, J. & Field Z. (2012). Discovering Statistics Using R. Thousand Oaks, CA: SAGE

Fix, E. and Hodges, J. L. (1951): An important contribution to nonparametric discriminant analysis and density estimation. International Statistical Review, 57(3) pp. 233–247

Freeman, E. A., & Moisen, G. (2008). Presence Absence: An R package for presence absence analysis. Journal of Statistical Software. 23 (11): 31 pp.

Ganesh, J., Arnold, M. J., & Reynolds, K. E. (2000). Understanding the customer base of service providers: an examination of the differences between switchers and stayers. Journal of Marketing, 64(3), 65-87.

García, D. L., Nebot, À., & Vellido, A. (2017). Intelligent data analysis approaches to churn as a business problem: a survey. Knowledge and Information Systems, 51(3), 719-774.

Giudicati, G., Riccaboni, M., & Romiti, A. (2013). Experience, socialization and customer retention: Lessons from the dance floor. Marketing Letters, 24(4), 409-422.

Glady, N., Baesens, B., & Croux, C. (2009). Modeling churn using customer lifetime value. European Journal of Operational Research, 197(1), 402-411

Gnanadesikan, R. (1988). Discriminant analysis and clustering. National Academies.

Gordini, N., & Veglio, V. (2017). Customers churn prediction and marketing retention strategies. An application of support vector machines based on the AUC parameter-selection technique in B2B e- commerce industry. Industrial Marketing Management, 62, 100-107.

Griffeth, R. W., & Hom, P. W. (1995). The employee turnover process. Research in Personnel and Human Resources Management, 13(3), 245-293

Griffeth, R. W., Hom, P. W., & Gaertner, S. (2000). A meta-analysis of antecedents and correlates of employee turnover: Update, moderator tests, and research implications for the next millennium. Journal of Management, 26(3), 463-488.

Hand, D.J., (2006). Classifier technology and the illusion of progress. Statistical Science, 21(1): 15

Haron, S., & Shanmugam, B. (1994). Lending to small business in Malaysia. Journal of Small Business Management, 32(4), 88.

Hassouna, M., Tarhini, A., Elyas, T., & AbouTrab, M. S. (2016). Customer Churn in Mobile Markets A Comparison of Techniques. arXiv preprint arXiv: 1607.07792.

Page 104: THESIS/DISSERTATION APPROVED BY - Creighton University

102 | P a g e

Heavey, A. L., Holwerda, J. A., & Hausknecht, J. P. (2013). Causes and consequences of collective turnover: A meta-analytic review. Journal of Applied Psychology, 98(3), 412.

Heitfield, E., & Sabarwal, T. (2004). What drives default and prepayment on subprime auto loans? The Journal of Real Estate Finance and Economics, 29(4), 457-477.

Hogan, J. J. (1992). Turnover and what to do about it. Cornell Hotel and Restaurant Administration Quarterly, 33(1), 40-45.

Hom, P. W., Lee, T. W., Shaw, J. D., & Hausknecht, J. P. (2017). One hundred years of employee turnover theory and research. Journal of Applied Psychology, 102(3), 530.

Höppner, S., Stripling, E., Baesens, B., & Verdonck, T. (2017). Profit Driven Decision Trees for Churn Prediction. arXiv preprint arXiv:1712.08101. Retrieved from: https://arxiv.org/abs/1712.08101

Huang, B., Kechadi, M. T., & Buckley, B. (2012). Customer churn prediction in telecommunications. Expert Systems with Applications, 39(1), 1414-1425.

Huffman, M. (2018). “Subprime auto loan defaults hit 20-year high in March Consumers with poor credit may find it harder to buy a car” Retrieved from: https://www.consumeraffairs.com/news/subprime-auto-loan-defaults-hit-20-year-high-in-march- 051618.html

Hughes, R. G., Bobay, K. L., Jolly, N. A., & Suby, C. (2015). Comparison of nurse staffing based on changes in unit‐level workload associated with patient churn. Journal of Nursing Management, 23(3), 390-400.

Innovative Funding Services (2019). The Four C’s Of Credit. Retrieved from https://www.ifsautoloans.com/the-four-cs-of-credit/

Islam, M. J., Wu, Q. J., Ahmadi, M., & Sid-Ahmed, M. A. (2007, November). Investigating the performance of naive-bayes classifiers and k-nearest neighbor classifiers. In Convergence Information Technology, 2007. International Conference on (pp. 1541-1546). IEEE.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). New York: springer.

Jahromi, A. T., Stakhovych, S., & Ewing, M. (2014). Managing B2B customer churn, retention and profitability. Industrial Marketing Management, 43(7), 1258-1268.

Jankowicz, A. D., & Hisrich, R. D. (1987). Intuition in small business lending decisions. Journal of Small Business Management, 25(3), 45

Jha, S. (2009). Determinants of employee turnover intentions: A review. Management Today, 9(2), 26- 33.

Page 105: THESIS/DISSERTATION APPROVED BY - Creighton University

103 | P a g e

Jiménez, G., & Saurina, J. (2004). Collateral, type of lender and relationship banking as determinants of credit risk. Journal of Banking & Finance, 28(9), 2191-2212.

Kahane, L. (2008). Regression basics (2nd ed.) Los Angeles: Sage Publications.

Kamakura, W., Mela, C., Ansari, A., Bodapati, A., Fader, P., Iyengar, R., et al. (2005). Choice models and customer relationship management. Marketing Letters, 16(3), 279–291.

Kanjilal, J., & Microsoft Corporation. (2008). Entity Framework tutorial learn to build a better data access layer with ADO.NET Entity Framework and ADO.NET data services. Birmingham, U.K.: Packt Pub.

Keramati, A., Ghaneei, H., & Mirmohammadi, S. M. (2016). Developing a prediction model for customer churn from electronic banking services using data mining. Financial Innovation, 2(1), 1-13.

Khan, A. A., Jamwal, S., & Sepehri, M. M. (2010). Applying data mining to customer churn prediction in an internet service provider. International Journal of Computer Applications, 9(7), 8-14.

Khan, M. R., Manoj, J., Singh, A., & Blumenstock, J. (2015). Behavioral modeling for churn prediction: Early indicators and accurate predictors of custom defection and loyalty. In Big Data Congress, IEEE International Congress, 677-680

Kiisel, T. (2013) “The Five ‘C’s of Small Business Lending”. Retrieved from: http://www.easybib.com/guides/citation-guides/apa-format/how-to-cite-a-website-apa/

Knox, G., & Van Oest, R. (2014). Customer complaints and recovery effectiveness: A customer base approach. Journal of Marketing, 78(5), 42-57.

Koturwar, P., Girase, S., & Mukhopadhyay, D. (2015). A survey of classification techniques in the area of big data. arXiv preprint arXiv:1503.07477.

Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). New York: Springer.

Kumar, V., Leszkiewicz, A., & Herbst, A. (2018). Are You Back for Good or Still Shopping Around? Investigating Customers' Repeat Churn Behavior. Journal of Marketing Research (JMR). 55, 2, 208-225

Ledolter, J. (2013). Data mining and business analytics with R. John Wiley & Sons.

Long, H. (2017) “A record 107 million Americans have car loans”. Retrieved from: https://money.cnn.com/2017/05/19/news/economy/us-auto-loans-soaring/index.html

Loveman, G. W. (1998). Employee satisfaction, customer loyalty, and financial performance: an empirical examination of the service profit chain in retail banking. Journal of Service Research, 1(1), 18- 31.

Page 106: THESIS/DISSERTATION APPROVED BY - Creighton University

104 | P a g e

Ma, S., Tan, H., & Shu, F. (2015). When is the best time to reactivate your inactive customers? Marketing Letters, 26(1), 81-98.

Martens, D., Vanthienen, J., Verbeke, W., & Baesens, B. (2011). Performance of classification models from a user perspective. Decision Support Systems, 51(4), 782-793.

McDonald, H. (2010). The factors influencing churn rates among season ticket holders: An empirical analysis. Journal of Sport management, 24(6), 676-701.

McWatters, M., J. (2018, April 24). McWatters to Hatch: Eliminating CU exemption would create issues. Retrieved from https://www.nafcu.org/newsroom/mcwatters-hatch-eliminating-cu-exemption- would-create-issues

Mendez, G., Buskirk, T. D., Lohr, S., & Haag, S. (2008). Factors associated with persistence in science and engineering majors: An exploratory study using classification trees and random forests. Journal of Engineering Education, 97(1), 57-70.

Mobley, W. H. (1977). Intermediate linkages in the relationship between job satisfaction and employee turnover. Journal of Applied Psychology, 62(2), 237.

Moti, H. O., Masinde, J. S., Mugenda, N. G., & Sindani, M. N. (2012). Effectiveness of credit management system on loan performance: empirical evidence from micro finance sector in Kenya. International Journal of Business, Humanities and Technology, 2(6), 99-108.

Mozer, M. C., Wolniewicz, R. H., Grimes, D. B., Johnson, E., & Kaushansky, H. (2000). Churn reduction in the wireless industry. In Advances in Neural Information Processing Systems (pp. 935-941).

Moeyersoms, J., & Martens, D. (2015). Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector. Decision support systems, 72, 72-81.

Neslin, S. A., Gupta, S., Kamakura, W., Lu, J., & Mason, C. H. (2006). Defection detection: Measuring and understanding the predictive accuracy of customer churn models. Journal of marketing research, 43(2), 204-21

Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by logistic regression and decision tree. Expert Systems with Applications, 38(12), 15273-15285. Ohsaki, M., Wang, P., Matsuda, K., Katagiri, S., Watanabe, H., & Ralescu, A. (2017). Confusion-matrix-based kernel logistic regression for imbalanced data classification. IEEE Transactions on Knowledge and Data Engineering, 29(9), 1806-1819.

Ongori, H. (2007). A review of the literature on employee turnover. African Journal of Business Management 1(3), 49-54

Owczarczuk, M. (2010). Churn models for prepaid customers in the cellular telecommunication industry using large data marts. Expert Systems with Applications, 37(6), 4710-4712.

Page 107: THESIS/DISSERTATION APPROVED BY - Creighton University

105 | P a g e

Park, T. Y., & Shaw, J. D. (2013). Turnover rates and organizational performance: A meta-analysis. Journal of Applied Psychology, 98, 268–309.

Parvin, H., Alizadeh, H., & Minaei-Bidgoli, B. (2008, October). MKNN: Modified k-nearest neighbor. In Proceedings of the World Congress on Engineering and Computer Science (Vol. 1). Citeseer.

Pohar, M., Blas, M., & Turk, S. (2004). Comparison of logistic regression and linear discriminant analysis: a simulation study. Metodoloski zvezki, 1(1), 143.

Punnoose, R. &Ajit, P. (2016). Prediction of employee turnover in organizations using machine learning algorithms. Algorithms, 4(5), C5.

Rajamohamed, R., & Manokaran, J. (2017). Improved credit card churn prediction based on rough clustering and supervised learning techniques. Cluster Computing, 21(1), 65-77.

Rasiah, D. (2010). Review of literature and theories on determinants of commercial bank profitability. Journal of Performance Management, 23(1), 23.

Reichheld, F., & Sasser, W. (1990). Zero defections: Quality comes to services: To learn how to keep customers, track the ones you lose (includes data on MBNA America credit card business). Harvard Business Review, 68(5), 105-11.

Risselada, H., Verhoef, P. C., & Bijmolt, T. H. (2010). Staying Power of Churn Prediction Models. Journal of Interactive Marketing, 24(3), 198-208.

Rowley, G., & Purcell, K. (2001). ‘As cooks go, she went’: is labour churn inevitable?. International Journal of Hospitality Management, 20(2), 163-185.

Rust, R. T., & Zahorik, A. J. (1993). Customer satisfaction, customer retention, and market share. Journal of Retailing, 69(2), 193-215.

Rust, R. T., Kumar, V., & Venkatesan, R. (2011).Will the frog change into a prince? Predicting future customer profitability. International Journal of Research in Marketing, 28(4), 281–294.

Saradhi, V. V., & Palshikar, G. K. (2011). Employee churn prediction. Expert Systems with Applications, 38(3), 1999-2006.

Sattar, S., & Ahmed, S. (2014). Factors effecting employee turnover in banking sector. Developing Country Studies, 4(3), 110-115.

Seide, C. K. (2012). Consumer financial protection post Dodd-Frank: Solutions to protect consumers against wrongful foreclosure practices and predatory subprime auto lending. U. PUERTO RICO BUS. LJ, 3, 219-233.

Sengupta, R., & Bhardwaj, G. (2015). Credit scoring and loan default. International Review of Finance, 15(2), 139-167.

Page 108: THESIS/DISSERTATION APPROVED BY - Creighton University

106 | P a g e

Scully, M. “(2017). “Deep Subprime’ Auto Loans Are Surging” Retrieved from: https://www.bloomberg.com/news/articles/2017-03-28/-deep-subprime-becomes-norm-in-car- loan-market-analysts-say

Sharma R.R., Rajan S. (2017). Evaluating prediction of customer churn behavior based on artificial bee colony algorithm. International Journal of Engineering and Computer Science, 6(1), 20017-20021 Sharma A., & Panigrahi, P.K. (2011). A neural network based approach for predicting customer churn in cellular network services. International Journal of Computer Applications, 27(11), 26–31

Shukla, S., & Sinha, A. (2013). Employee Turnover in banking sector: Empirical evidence. IOSR Journal of Humanities and Social Science, 11(5), 57-61.

Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006, December). Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In Australasian joint conference on artificial intelligence (pp. 1015-1021). Springer, Berlin, Heidelberg.

Stahlecker, P., & Trenkler, G. (1993). Some further results on the use of proxy variables in prediction. The Review of Economics and Statistics, 707-711.

Strohmeier, S., & Piazza, F. (2013). Domain driven data mining in human resource management: A review of current research. Expert Systems with Applications, 40(7), 2410-2420.

Suh, E., & Alhaery, M. (2016). Customer Retention: Reducing Online Casino Player Churn through the Application of Predictive Modeling. UNLV Gaming Research & Review Journal, 20(2), 63

Sulaimon, O. S., Emmanuel, O. E., & Bilqis, B. B. (2016). Relevant Drivers for Customer Churn and Retention Decision in the Nigerian Mobile Telecommunication Industry. Journal of Competitiveness, 8(3). 1-16

Tamaddoni, J., Stakhovych S., & Ewing, M. (2014). Managing B2B customer churn, retention and profitability. Industrial Marketing Management, 43(7), 1258-1268.

Tamaddoni, J., Stakhovych, S., & Ewing, M. (2016). Comparing churn prediction techniques and assessing their performance: a contingent perspective. Journal of Service Research, 19(2), 123-141.

Takezawa, K. (2014). Learning regression analysis by simulation. Springer.

Tang, L., Thomas, L., Fletcher, M., Pan, J., & Marshall, A. (2014). Assessing the impact of derived behavior information on customer attrition in the financial service industry. European Journal of Operational Research, 236(2), 624-633.

Tsai, C. F., & Lu, Y. H. (2009). Customer churn prediction by hybrid neural networks. Expert Systems with Applications, 36(10), 12547-12553.

Udechukwu, I. I., & Mujtaba, B. G. (2007). Determining the probability that an employee will stay or leave the organization: A mathematical and theoretical model for organizations. Human Resource Development Review, 6(2), 164-184.

Page 109: THESIS/DISSERTATION APPROVED BY - Creighton University

107 | P a g e

Van den Poel, D., & Lariviere, B. (2004). Customer attrition analysis for financial services using proportional hazard models. European Journal of Operational Research, 157(1), 196-217.

Van Haver, J. Benchmarking analytical techniques for churn modelling in a B2B context. Verbeke, W., Martens, D., Mues, C., & Baesens, B. (2011). Building comprehensible customer churn prediction models with advanced rule induction techniques. Expert Systems with Applications, 38(3), 2354-2364.

Verbeke, W., Dejaeger, K., Martens, D., Hur, J., & Baesens, B. (2012). New insights into churn prediction in the telecommunication sector: A profit driven data mining approach. European Journal of Operational Research, 218(1), 211-229.

Viswanath, P., & Sarma, T. H. (2011, September). An improvement to k-nearest neighbor classifier. In Recent Advances in Intelligent Computational Systems (RAICS), 2011 IEEE (pp. 227-231). IEEE.

Vuk, M., & Curk, T. (2006). ROC curve, lift chart and calibration plot. Metodoloski zvezki, 3(1), 89.

Waxer, C. (2011). Loyalty and the B2B customer. Loyalty360 — The Loyalty Marketer's Association

Wei, C. P., & Chiu, I. T. (2002). Turning telecommunications call details to churn prediction: a data mining approach. Expert Systems with Applications, 23(2), 103-112.

Wiersema, F. (2013). The B2B Agenda: The current state of B2B marketing and a look ahead. Industrial Marketing Management, 42(4), 470–488.

Wu, B., Abbott, T., Fishman, D., McMurray, W., Mor, G., Stone, K., & Zhao, H. (2003). Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics, 19(13), 1636-1643.

Yiğit, İ. O., & Shourabizadeh, H. (2017, September). An approach for predicting employee churn by using data mining. In 2017 International Artificial Intelligence and Data Processing Symposium (IDAP) (pp. 1-4). IEEE.

Yu, X., Guo, S., Guo, J., & Huang, X. (2011). An extended support vector machine forecasting framework for customer churn in e-commerce. Expert Systems with Applications, 38(3), 1425-1430.

Zehra, O.K., (2014). Employee turnover prediction using machine learning based methods (Doctoral dissertation, Middle East Technical University).

Zhang, Y., Qi, J., Shu, H., & Cao, J. (2007, October). A hybrid KNN-LR classifier and its application in customer churn prediction. In Systems, Man and Cybernetics, 2007. ISIC. IEEE International Conference on (pp. 3265-3269). IEEE.

Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition, 40(7), 2038-2048.

Page 110: THESIS/DISSERTATION APPROVED BY - Creighton University

108 | P a g e

Zhao, L., Gao, Q., Dong, X., Dong, A., & Dong, X. (2017). K-local maximum margin feature extraction algorithm for churn prediction in telecom. Cluster Computing, 20(2), 1401–1409. Zoric, A. (2016). Predicting customer churn in banking industry using neural networks. Interdisciplinary Description of Complex Systems: INDECS, 14(2), 116-124.

Page 111: THESIS/DISSERTATION APPROVED BY - Creighton University

109 | P a g e

Page 112: THESIS/DISSERTATION APPROVED BY - Creighton University

110 | P a g e