93
Customer Lifetime Value in the Mobile Phone Market in Iceland Anna Guðrún Birgisdóttir August 2013

Customer Lifetime Value in the Mobile Phone Market in Iceland

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Customer Lifetime Value in the Mobile Phone

Market in Iceland

Anna Guðrún Birgisdóttir

August 2013

Customer Lifetime Value in the Mobile Phone

Market in Iceland

Author

Anna Guðrún Birgisdóttir

Hverfisgata 23

220 Hafnarfjörður, Iceland

Telephone: 00354 6617171

E-mail: [email protected]

Student number: s1660411

University of Groningen

Faculty of Economics and Business

Master Thesis Business Administration

Specialization Marketing Research

First supervisor: Dr. M.C. Non

Second supervisor: Dr. H. Risselada

August 18, 2013

Management Summary

Customer lifetime value is becoming of high interest to many businesses, especially those

who provide any kind of services. Customer lifetime value gives the business an idea about

how valuable a customer is to the business and enables it to target the most valuable ones in

order to retain them. In the resent years, the mobile telephone market in Iceland has become

increasingly competitive as new telecommunications companies entered the market. This has

resulted in consumers who search for better prices and service or products. The consequences

are that customer churn has increased and it is necessary for mobile phone providers to be

able to predict churn accurately as this in turn affects customer lifetime value which decreases

as the probability of churn increases. Customer lifetime value can then be used to segment the

customer database which makes it easier to custom-make services and products which suit the

customers’ needs. Two classification methods (1) logistic regression and (2) decision tree

were used on two separate sets of data where customers were labeled as churn and non-churn

in order to make a model that predicts churn. The data sets consisted of post-paid customers

on one hand and pre-paid customers on the other. This model was then used to calculate

customer lifetime values of all customers at an Icelandic telecom which then gave some

insight into which customers are most valuable and what characterizes them. The customers

were then segmented based on their customer lifetime value.

Keywords: telecommunications companies, mobile phone market, churn prediction, customer

lifetime value, segmentation.

Preface

After a long journey working on this research I want to thank those who have in any means

supported me or assisted me on the way.

I first would like to express my gratitude to my supervisor Dr. Marielle C. Non at the

University of Groningen in the Netherlands. She has been very patient and extremely helpful

during this time as it is not easy conducting this type of work mainly through emails. Her

advice and comments have been valuable and helped me to see this through. My gratitude to

Dr. Hans Risselada for his comments on improvements.

I want to thank my contacts at Telecom X for their support and interest in this research as well

as patience. I also want to thank them for the opportunity to write this thesis in cooperation

with Telecom X and for providing me with the necessary data and information to be able to

work on this analysis.

Finally, I want to thank my whole family for their support and kindness during this time. My

parents Hildigunnur and Birgir and parents-in-law Stefanía and Ingimar for helping with my

two sons, also my sister-in-law Freyja Björk who helped me get in contact with the staff at

Telecom X. Many thanks as well to my other sister-in-law Inga Jóna for her supportive and

motivating talks and moral support from my two brothers Björn Gunnar and Birgir Örn. My

deepest appreciation to my husband Stefán for being patient and being there for me and

helping in any way possible and to our two beautiful sons, Stefán Gunnar and Birgir Hrafn.

v

Table of Contents

Management Summary ............................................................................................................................. i

Preface ...................................................................................................................................................... i

Table of Contents .................................................................................................................................... v

List of Figures ....................................................................................................................................... vii

List of Tables ........................................................................................................................................ viii

1. Introduction ......................................................................................................................................... 1

1.1 Telecommunications Industry ....................................................................................................... 3

1.1.1 Telecommunications Industry in Iceland ............................................................................... 4

1.1.2 The Icelandic Telecommunications Company ....................................................................... 6

1.2 Research Questions ....................................................................................................................... 6

1.3 Structure of the Thesis ................................................................................................................... 7

2. Theoretical Framework ....................................................................................................................... 8

2.1 Customer Lifetime Value .............................................................................................................. 8

2.1.1 CLV Model ........................................................................................................................... 11

2.1.2 Margin .................................................................................................................................. 12

2.1.3 Discount Rate ....................................................................................................................... 12

2.1.4 Retention rate (1-Churn)....................................................................................................... 13

2.2 Segmentation ............................................................................................................................... 16

2.3 Conceptual Model ....................................................................................................................... 17

2.4 Summary ..................................................................................................................................... 18

3. Methodology ..................................................................................................................................... 19

3.1 Research Design .......................................................................................................................... 19

3.2 Sample ......................................................................................................................................... 19

3.3 Variables ...................................................................................................................................... 19

3.4 Plan of Analysis........................................................................................................................... 20

3.4.1 Average Revenue per User (ARPU) ..................................................................................... 20

3.4.2 The Discount Rate (WACC) ................................................................................................ 21

3.4.3 Churn Analysis ..................................................................................................................... 21

3.4.4 The CLV Calculation ........................................................................................................... 27

3.4.5 Segmentation ........................................................................................................................ 27

3.5 Summary ..................................................................................................................................... 27

4. Data preparation ................................................................................................................................ 28

4.1. Sampling ..................................................................................................................................... 28

vi

4.2. The time aspect ........................................................................................................................... 29

4.3. Independent variables ................................................................................................................. 30

4.5 Summary ..................................................................................................................................... 31

5. Results ............................................................................................................................................... 33

5.1 Post-paid customers ..................................................................................................................... 33

5.1.1 Sample description ............................................................................................................... 33

5.1.2 Multicollinearity ................................................................................................................... 37

5.1.3 Principal component analysis ............................................................................................... 37

5.1.4 Logistic regression................................................................................................................ 42

5.1.5 Decision Tree ....................................................................................................................... 48

5.2 Pre-paid customers ...................................................................................................................... 54

5.2.1 Sample description ............................................................................................................... 54

5.2.2 Multicollinearity ................................................................................................................... 57

5.2.3 Principal component analysis ............................................................................................... 57

5.2.4 Logistic Regression .............................................................................................................. 60

5.1.5 Decision Tree ....................................................................................................................... 64

5.3 Hypotheses .................................................................................................................................. 68

5.4 CLV calculations ......................................................................................................................... 69

5.4.1 Segmentation ........................................................................................................................ 69

5.5 Summary ..................................................................................................................................... 71

6. Conclusion and recommendations ..................................................................................................... 73

6.1 Recommendations ....................................................................................................................... 73

6.2 Limitations and future research ................................................................................................... 74

References ............................................................................................................................................. 76

Appendix I ............................................................................................................................................. 81

vii

List of Figures

Figure 2-1: Conceptual model of the Customer Lifetime Value ........................................................... 17

Figure 3-1: An example of a decision tree for churn..............................................................................23

Figure 3-2: An example of a ROC curve............................................................................................... 26

Figure 4-1: The time window of the analysis.........................................................................................30

Figure 5-1: ROC curve for the logistic regression in the post-paid training sample…………………..46

Figure 5-2: ROC curve for the decision tree in the post-paid training sample ...................................... 52

Figure 5-3: ROC curve for the logistic regression for the pre-paid training sample ............................. 63

Figure 5-4: ROC curve for the decision tree for the pre-paid training sample ...................................... 67

viii

List of Tables

Table 2-1: Market share in the mobile phone market in Iceland .................................................................................. 4

Table 2-2: Market share in the post- and pre-paid mobile phone markets in Iceland in 2008 and 2012 ..................... 5

Table 3-1: Confusion matrix ....................................................................................................................................... 24

Table 4-1: Distribution of the data used in the training and testing sets ..................................................................... 28

Table 5-1: Marital status of customers in the post-paid training sample .................................................................... 33

Table 5-2: Family size of customers in the post-paid training sample ........................................................................ 34

Table 5-3: Residence of customers in the post-paid training sample………………………….……………………..36

Table 5-4: Crosstable of Status*Gender in the post-paid training sample .................................................................. 35

Table 5-5: Cronbach’s alpha for the components for the post-paid training sample .................................................. 39

Table 5-6: Comparison of PCA and PA eigenvalues in the post-paid training sample .............................................. 41

Table 5-7: Results from the logistic regression for the post-paid training sample ...................................................... 42

Table 5-8: Classification Table for the logistic regression for the post-paid training sample ..................................... 46

Table 5-9: Classification table for the logistic regression for the post-paid testing sample ........................................ 48

Table 5-10: Risk estimates of different growing methods for the post-paid training sample ..................................... 50

Table 5-11: Classification table for unpruned decision tree in the post-paid training sample .................................... 50

Table 5-12: Classification table for pruned decision tree in the post-paid training sample ........................................ 50

Table 5-13: Classification table for decision tree in the post-paid testing sample ...................................................... 52

Table 5-14: Marital status of customers in the pre-paid training sample .................................................................... 54

Table 5-15: Family size of customers in the pre-paid training sample ....................................................................... 55

Table 5-16: Residence of customers in the pre-paid training sample……………………..……………………….....58

Table 5-17: Crosstable of Status*Gender for the pre-paid training sample ................................................................ 55

Table 5-18: Cronbach’s alpha for the components for the pre-paid training sample .................................................. 58

Table 5-19: Comparison of PCA and PA eigenvalues for the pre-paid training sample............................................. 59

Table 5-20: Results from the logistic regression in the pre-paid training sample ....................................................... 60

Table 5-21: Classification Table for the logistic regression for the pre-paid training sample .................................... 62

Table 5-22: Classification table for the logistic regression for the pre-paid testing sample ....................................... 64

Table 5-23: Risk estimates of different growing methods for the pre-paid training sample ....................................... 65

Table 5-24: Classification table for the unpruned decision tree for pre-paid training sample .................................... 65

Table 5-25: Classification table for the pruned decision tree for pre-paid training sample ........................................ 66

Table 5-26: Classification table for the decision tree in the pre-paid testing sample ..................................................67

1

1. Introduction

Economies today are becoming primarily service-based and companies get a large part of

their revenue from creating and sustaining long-term relationships with their customers

(Kumar and Shah, 2009). Most companies are concerned with the revenue that their

customers generate, as well as the associated cost of acquiring and maintaining these

customers. One of the biggest benefits of retaining an existing customer is that the profits that

he generates over time tend to accelerate. One reason for this is that revenues from customers

usually grow over time. They often start using a new product or service slowly in the

beginning but as they become more accustomed to it, they use it more. Another reason is that

it is more efficient to serve old, existing customers which can reduce costs. Customers’

familiarity with the company’s products and services makes them less reliant on employees

for assistance. Existing customers who are satisfied also act as referrals as they recommend

the company to others. The final reason is that in some industries, existing customers even

pay higher prices than new ones, as the new ones are often offered special trial discounts

when they start the relationship with a company. One major concern is to ascertain which of

the customers will be most profitable. Upon such discovery companies may aspire to retain

these customers for some time as repeat purchases by established customers normally require

less marketing effort, as much as 90% less, compared to new customers who are purchasing

for the first time (Berger and Nasr, 1998; Dahr and Glazer, 2003). Companies should be

aware of their customers worth, attempt to understand their lifetime value and in turn apply it

as a guiding concept for marketing decisions and in developing marketing strategies.

For over a decade, companies have invested vast amounts in Customer Relationship

Management (CRM) systems. These systems provide opportunities to quickly gather

information about the customers, along with identifying the most profitable ones to the

company over time. Furthermore CRM may help companies increase loyalty among the

customers as a consequence of customization of the company’s services and products (Rigby

et al., 2002). Some of the essential metrics of CRM have been customer satisfaction,

retention, acquisition and loyalty but recently concepts like “customer lifetime value” (CLV)

and “past customer value” (Kumar and Reinartz, 2006) along with “churn” have become

centers of attention. Managing customers on the basis of customer lifetime value has become

one of the most popular and competent ways of doing business in recent years. What makes

2

the CLV metric so appealing is its capacity to acquire, grow, and retain customers who are

considered profitable to the company, and to foster profitable CRM through proper marketing

interventions. CLV has therefore become known as a key customer value metric that is

necessary to manage customers’ profitability and by maximizing CLV, and therefore

customer equity (the sum of the lifetime values of the company’s customers), companies can

increase their profits (Abe, 2009; Borle et al., 2008; Gupta et al., 2006; Kumar and Shah,

2009; Venkatesan and Kumar, 2004).

Companies today have vast opportunities to interact directly with customers by

collecting and mining information and subsequently tailoring their products and offerings

accordingly. Customers even expect to interact closely with the respective companies and

have some influence on the creation of the products and services which they purchase and

use. Companies wishing to stay competitive have therefore transcended from simply

marketing products to the mass, towards cultivating and serving their customers on a more

customized basis, resulting in maximization of customer lifetime value. Communication

consequently becomes reciprocal and is individualized or tightly targeted at narrow segments.

By promoting the company’s products or services to the customer in this manner, the

company can build long-term relationships with its customers (Rust et al., 2010). Customer

relationships evolve over time, as do the customer’s needs and wants. Companies can utilize

the information they gather and any changes therein, by providing customers with updated

offers on different products or services. The changes can for example be tracked with

demographic data and customer purchase patterns (Rust et al., 2010).

Use of interactive and database technology allows companies to accumulate a wide

range of data about individual customers’ needs and preferences. This data can then be used

to equally customize products and services. The more companies learn about their customers’

needs, the better they can respond to their requirements and offer exactly what customers

want, when they want it. This gives a company a great competitive advantage (Pine II et al.,

1995).

Calculating CLV can help companies find out which customers they want to build a

relationship with. Each customer has different needs and preferences as well as having

different current and potential values towards the company. Companies can divide their

customer base into groups or segments, based on customer lifetime values. These segments

range from including the most profitable customers, with whom the company should broaden

and deepen its relationship, to the least profitable ones, whom the company may wish to let go

3

or not focus on in particular. Segmenting the customer base in this manner makes it easier to

find suitable responses, for example to profitable relationships that should be invested in to

win back or grow, or in turn to manage costs to make segments that are lower-margin

worthwhile or even to terminate customer relationships in unattractive segments (Niraj et al.,

2001; Rigby et al., 2002). Companies can use predictive modeling to identify the customers

who are most profitable, as well as those customers with the greatest profit potential and those

likeliest to cancel their accounts (Davenport, 2006). By using CLV, companies can develop

their long-term relationships with customers and define their strategies better.

In this thesis, CLV for an Icelandic telecom will be calculated and an attempt made to

shed light on the factors that influence CLV. In the next section, background on the

telecommunications industry and the telecom will be given. In the subsequent section

thereafter the research questions are presented.

1.1 Telecommunications Industry

Companies offering mobile telecommunications, form part of the service industry. In recent

years the telecommunications industry has been opened up by deregulation, new technologies

and new competitors, making competition in this market extremely fierce. As the markets for

mobile telecommunications in many countries are getting to the stage of maturity, the industry

is moving towards retaining existing customers instead of focusing only on attracting new

ones. Furthermore the environment of the mobile telecommunications industry has undergone

extensive changes. Part of these changes is the transfer of services of mobile

telecommunications from being voice-centered communication towards being a combination

of multimedia and high-speed data communication. Further influences relate to the expansion

of the wireless Internet and the fact that customers are now able to switch mobile network

operators and still keep the same phone number they had before (mobile number portability

(MNP)). All this leads to stronger competition between companies within this industry. In

such an environment of extreme competition and rapid customer churn, an accurate

calculation of customer value and targeted customer segmentation are significant factors for

successful CRM. Consequently its implementation requires careful consideration. Models for

customer lifetime values (CLV) can be used to find out the dissimilarity in profitability

amongst numerous market segments. One of the greatest influences on CLV is the churn rate,

which is something that a company can actually have an effect on. Mobile service providers

therefore pay more attention to churn prediction and management as that could help maximize

4

CLV. The mobile service providers should be able to predict the churn rate for individual

customers to see which subscribers are at risk of changing services and to calculate their

customer lifetime values to sort out the most valuable ones. This information can then be used

to improve customer segmentation and implement them in making strategies directed at

customers (Kim et al., 2004; Wei and Chiu, 2002).

1.1.1 Telecommunications Industry in Iceland

Companies in telecommunications in Europe have undergone extensive transformations since

the 1980s, primarily due to the deregulation and liberalization of the European

telecommunications market. They have gone from being public monopolies, owned and

governed by the state, to being privatized and market driven (Eliassen and From, 2007). This

liberalization began somewhat later in Iceland, in the late 1990’s to early 2000. In 2011 five

telecoms provided mobile phone services in Iceland, both fixed (post-paid) and pre-paid

subscriptions. They are Siminn hf., Fjarskipti ehf. (Vodafone), Nova ehf., IP-fjarskipti ehf.

(Tal) and Alterna Tel. ehf. Over all, at the end of 2010 there were 375430 mobile

subscriptions in total, which is an increase of more than 15% in subscriptions since 2008.

Table 2-1 shows the development of the market share of each of the telecommunications

companies in the mobile phone market in Iceland in 2008, 2010 and 2012.

Table 2-1: Market share in the mobile phone market in Iceland

Telecommunications company Market share

2008 2010 2012

Siminn 51.6% 41.8% 37.4%

Vodafone 34.9% 30.9% 28.9%

Nova 8.2% 22% 28.3%

Tal 5.4% 4.5% 5.0%

Alterna ... 0.8% 0.4%

Table 2-1 shows the overall market share for the five telecoms in the mobile phone

market in Iceland. As the table shows, Siminn’s market share has decreased from 51.6% in

2008 to 37.4% in 2012. Vodafone and Tal have also experienced decrease but Tal seems to be

increasing its share last year. At the same time Nova, which is directed at the young people,

has increased its market share substantially, from 8.2% to 28.3%. Table 2-2 on the next page

shows the telecoms’ market share for post-paid subscriptions (table on the left) and for pre-

paid subscriptions (table on the right) in Iceland in 2008 and 2012 (Post- and Telecom

Administration, 2010 and 2012). In the post-paid mobile phone market, Siminn has the largest

market share of 48.3% in 2012 but had decreased from 54.0% in 2008. During these years,

5

Nova had more than doubled its market share. Tal also saw some increase in market share but

Vodafone a decrease like Siminn.

Table 2-2: Market share in the post- and pre-paid mobile phone markets in Iceland in 2008 and 2012

Telecommunications

company

Market share in post-

paid subscriptions

2008 2012

Siminn 54.0% 48.3%

Vodafone 37,1% 33.7%

Nova 4.8% 11.6%

Tal 4.0% 5.6%

Alterna ... 0.8%

In the post-paid subscriptions market, Siminn and Vodafone have strong market

positions and can be looked at as market leaders. Nevertheless, both telecoms have, as stated

above, lost some of its market share to Nova and Tal. In the pre-paid mobile phone market

(see Table 2-2, the table on the right), Siminn no longer has the market leading position. Nova

is now the market leader with 49.3% from only 12.5% in 2008. Siminn has 23.2% market

share, which is down from 48.4% in 2008. The market share for both Vodafone and Tal has

also decreased since 2008 (Post- and Telecom Administration, 2010 and 2012). Here Siminn

has lost its market leading position to Nova and the competition seems to be strong between

the three largest telecoms, Siminn, Vodafone, which used to be second, and Nova. The

aforementioned shows that the competition in the mobile phone market has changed rapidly

over the resent years, as it has gone from being an almost duopoly with two players to a more

competitive environment. In the beginning of 2011, a new telecommunications company,

Hringdu, was established, making the competition even fiercer. Telecom X is for example

prohibited from bundling its products/services meaning it cannot offer more than one product

or service together as one combined product or offer a discount on one product if another one

is bought simultaneously. There are further restrictions on offering valuable customers special

offers or advertising special packages of products or services, making it more difficult for the

telecom to market its products and grow its business. Another fact that sets the

telecommunications industry in Iceland apart from other neighboring countries is that in

Iceland companies do not apply binding contracts. This is not a consequence of legal

requirements, but rather an example of development spurred by the strong competition within

the local market. The outcome is that customers do not have to sign a contract binding them

with one telecom for any given time period. Customers can therefore switch telecom

providers whenever they choose, perhaps making them even less loyal, as those who seek

good deals will have a higher probability of churning. New customers tend to be more prone

Telecommunications

company

Market share in pre-

paid subscriptions

2008 2012

Nova 12.5% 49.3%

Siminn 48.4% 23.2%

Vodafone 31.9% 23.2%

Tal 7.1% 4.2%

Alterna ... 0.2%

6

to be lost within the first few years. The customers who churn accounts every few years are

more likely to be younger, less-established households, with fewer relationships with the

company and fewer total products. This is in line with current developments at the telecom.

1.1.2 The Icelandic Telecommunications Company

This research project is conducted for the Telecom X. It offers a full range of

telecommunication services, including telephone, mobile phone, television and Internet

subscriptions.

The size of the buyers’ market in Iceland is small in general, with just over 318000

people living in Iceland (Statistics Iceland, 2011) making competition in any industry fierce

and difficult. For this reason, companies have to both hold on to their existing customers and

try to attract new ones. In Iceland five telecoms provide mobile phone service and there are

375430 mobile subscriptions (Post- and Telecom Administration, 2010). This is a similar

number of telecoms compared to the other Nordic countries where the population on the other

hand ranges from 4-10 million inhabitants per respective country. In an attempt to acquire

new customers, telecoms in Iceland have contacted customers directly who have a

subscription with a competitor and offered them deals in order to entice them to switch. This

method has in turn resulted in disloyal customers, who seem to leave after a short period of

time, following cheaper offers from other competitors. However this method of marketing is

less practiced nowadays as it has been shown to be ineffective. Advertising campaigns are

also frequent, especially in the market for young customers.

In late 2009 the telecom introduced a pre-paid card service especially aimed at

younger people. It had seen a decrease in market share in the age group from 16-34, since the

beginning of 2009 most likely because of market actions of other competitors like Nova and

Tal. This age group is amongst the most valuable customers, since they both talk more and

send text messages more frequently compared to older age groups.

1.2 Research Questions

This research project is concerned with evaluating customer churn and then using those

results among other components to calculate the customer lifetime value for customers at the

telecom.

7

In this research, the aim is to answer the following questions:

Marketing research problem

Is Customer Lifetime Value useful for a mobile phone provider?

The Research Questions

1. Which factors have an effect on the customer lifetime value of mobile phone

customers?

2. Which factors have an effect on the churn probability of mobile phone customers?

1.3 Structure of the Thesis

This thesis consists of six chapters. The next chapter discusses the theoretical framework

related to the concepts that are evaluated in this research and will be used to construct the

models. A conceptual framework will be represented along with hypotheses. The research

design is outlined in chapter 3, where the research method, data collection and plan of

analysis are described. Chapter 4 describes the data preparation, where the sampling and time

aspect of the thesis are structured. The independent variables are then listed and described.

The results of the analysis are provided in chapter 5, first from the churn analysis and then

secondly from the CLV calculations. Conclusions and recommendations based on the results

follow in chapter 6.

8

2. Theoretical Framework

In the first section of this chapter is a review of the literature related to the concepts of

customer lifetime value and churn. In addition, hypotheses will be formulated which are then

used to build the conceptual framework.

2.1 Customer Lifetime Value

Marketing is more or less about attracting customers who are profitable and keeping them. It

is not advisable for a company to try to pursue and satisfy every single customer, instead it

should concentrate on those customers who generate revenue for the company and are likely

to stay for a while. What makes a customer profitable is the amount of revenues that come

from a person, household or a company that exceed the company’s customer related costs of

attracting, selling and serving a customer. The excess revenues are called customer lifetime

value (Berger and Nasr, 1998). Customer lifetime value has been defined in several

researches. It is the present value of all future profits that are obtained from a customer over

his life of relationship with a company. CLV can be generally defined as the total net profit a

company can expect from a customer over their lifecycle (Gupta et al., 2006; Gupta and

Lehmann, 2003; Kumar and Shah, 2009; Niraj et al., 2001; Novo, 2004). Long-lifetime

customers have for a while been considered to be more profitable to a company. This

approach is customer-centric and treats customers as assets and focuses both on acquiring as

well as retaining customers. The customers who are retained can then form a basis of

sustained competitive advantage (Jain and Singh, 2002). Companies’ actions in marketing

have an influence on the behavior of customers, like acquisition, retention and cross-selling.

This then affects the CLV of customers or their profitability to a company (Gupta et al.,

2006).

CLV is becoming increasingly important as a marketing metric, both in academic

research and practice. Many international companies such as IBM, ING, and Capital One are

using CLV as a tool to measure and manage the success of their business. There are a number

of factors that might explain the increasing interest in this concept. In the first place, to show a

return on marketing investment, it is not enough to have marketing metrics like brand

awareness, attitudes or even sales and share. According to Blattberg et al. (2001), customers

are not all equally profitable so they suggest that companies might either terminate the

9

relationship with some customers who turn out to be unprofitable or allocate different

resources to different groups of customers depending on their profitability. This is impossible

with financial metrics like aggregate profit and stock price of a company. Even if these

measures are practical, they have limited diagnostic capability. CLV is on the other hand a

disaggregate metric and can therefore be used for the purpose of identifying profitable

customers and allocation of resources (Gupta et al., 2006; Kumar and Reinartz 2006).

Today the focus of marketing has gone from being product driven to being customer

driven (Rust et al., 2000). Companies increasingly get their revenue from creating and

nourishing long-term relationships with their customers, especially as modern economies

become largely service-based. Marketing should therefore work on achieving maximum

customer lifetime value and customer equity, which is the sum of the lifetime values of the

company’s customers, minus their acquisition and retention costs (Gupta et al., 2006;

Hanssens et al, 2008). CLV models are useful for market segmentation and the allocation of

marketing resources for acquisition, retention and cross-selling. Not all customers have the

same value to a company and this demonstrates the need to terminate invaluable customers or

allocate resources differently. CLV of current and future customers is also a good proxy of

overall firm value (Gupta et al., 2006; Hwang et al., 2004). By understanding the factors that

have an influence on the lifetime value of customers, companies can use that knowledge when

developing strategies such as loyalty programs and cross-selling (Kumar et al., 2004).

Companies gradually look at customers in terms of their lifetime value, or the net

present value of customers’ profit over a specific number of months. CLV is a robust and

clear-cut measure that shows the profitability and possibility of churn at an individual

customer level (Lu, 2003). Companies can use customer lifetime value to develop customer

loyalty and customer acquisition programs as well as treatment strategies for their existing

customers to maximize customer value. For those customers who are newly acquired,

companies can use customer lifetime value to develop strategies to grow the right customers

(Berger and Nasr, 1998; Davenport, 2006; Lu, 2003; Schweidel et al., 2011). Regarding the

calculation of CLV, there are usually two types of context taken into account. They are on one

hand “non-contractual”, where customer defection is not detected by the company and the

relationship between customer purchase behavior and CLV is unclear. Consequently longer

customer lifetime does not automatically mean higher CLV as customers divide their

expenses among many companies making it more difficult to predict into the future. On the

other hand it is “contractual” (like a mobile phone subscription) where it is possible to detect

10

customer defection and longer customer relationship may entail that a customer will have a

higher CLV due to increased cumulative profits (Bolton, 1998; Borle et al., 2008; Reinartz

and Kumar, 2000, 2003). Other concepts that are used to categorize customers are “lost-for-

good” and “always-a-share”. In the former case, a customer is considered to be loyal and

committed to one company and is similar as in contractual circumstances. If lost customers

return to a company they are treated as new ones. A customer retention model is used to

calculate CLV where a retention rate is estimated based on historical data. The retention rate

(also the same as 1-churn rate) is the probability that a customer will continue the relationship

with a company. In the case of “always-a-share”, customers can easily switch between

companies and do not give any one company all of their business. This is equivalent to non-

contractual circumstances. A customer migration model is used in these situations to calculate

CLV where the recency of last purchase is applied in order to predict the probability that a

customer will make a repeat purchase in a period (Berger, and Nasr, 1998; Rust et al., 2004).

In the case of this telecom, the customer relationships are of a contractual nature and therefore

can be looked as “lost-for-good” if they leave. However, the contracts do not define length

since customers of Icelandic telecoms are not bound for a specific time as is the custom in

many other countries. They can therefore terminate the relationship whenever they want.

Customer lifetime value is calculated differently across industries. The

telecommunications industry has a highly competitive market where customers can choose

between multiple service providers and also vigorously exercise their rights of switching from

one service provider to another. Customers request tailored products along with better

services at lower prices, service providers on the other hand focus on acquisitions as their

business goals. On average, the telecommunications industry faces 20-40% annual churn rate

and Lu (2003) stated that recruiting a new customer costs 5-10 times more than to retain an

existing customer. On the other hand, existing customers are also more likely to generate

more cash flow and profit as they are less sensitive to price. This has resulted in companies’

greater concentration on customer retention (Lu, 2003; Eiben et al., 1999; Ahn et al., 2006).

One of the main concerns for operators is therefore to retain highly profitable customers by

setting up strategies and processes to keep them longer by presenting them with tailored

products and services (Lu, 2003). With the increasing maturity of the telecommunications

market, it is not enough anymore for the telecoms to predict customer churn. Therefore they

have proceeded with examining customers in terms of customer lifetime value. Telecoms now

differentiate both between which customers stay longer and those who stay shorter, as well as

11

between those who are highly profitable and those who are less profitable or not at all (Lu,

2003).

A company can build a customer database if it wants to focus on establishing long-

term relationships with its customers. With the database, the company can identify its

customers, track their transactions and even predict changes in their purchase patterns at an

individual level. The information in the databases about customer‘s purchase patterns can also

be analyzed to target and retain the right customers and distinguish between active and

defected customers (Batislam et al., 2007).

2.1.1 CLV Model

The CLV model consists of three elements. These are a discount rate, customer churn and

margin. These elements will be discussed later in the chapter but first, the CLV model used in

this research is shown and explained.

One of the difficulties regarding the prediction of CLV is that there are many models

and approaches to apply and they depend also on the industry within which the company

operates. The life circumstances of customers also change along with their preferences which

can then have an effect on purchasing behavior over different periods. Therefore the length of

the period under consideration has to be decided on (Ryals, 2002). Unlike the discounted cash

flow approach which is used in finance, CLV can be estimated on the individual customer or

segment level. The strength of the telecom’s dataset is that longitudinal transaction data is

available for each customer of this company. This makes it possible to calculate CLV at the

individual customer level and uncover the customer-centric measures that drive CLV (Kumar

and Shah, 2009).

As noted earlier, there are many researches on calculating customer value. For the

purpose of this research, the following CLV model, done by Gupta and Lehmann (2003) and

Gupta et al. (2006), will be used. This model is based on a model by Berger and Nasr (1998)

and its use is quite straightforward. This is a also an advantage as it could be used again by

the marketing personnel at the telecom and other variations of the model can be used based on

the specific task at hand and availability of data.

12

The model is shown in Equation (2-1).

(2-1)

where,

m margin (ARPU)

d the discount rate (WACC)

r the retention rate or 1-churn

2.1.2 Margin

Margin often refers to the net profit of a company (revenue minus costs) divided by revenue.

However, in this case the costs are unknown so the metric used in this research will be

Average Revenue per User (ARPU). It represents the average revenue a telecom receives

divided by the number of subscribers per month. It is frequently used by industry observers

and regulators to evaluate the performance of mobile telephone market (McCloughan and

Lyons, 2006).

2.1.3 Discount Rate

As with the calculation of CLV, there are different ways of calculating the discount rate. For

this research, the most common method was chosen.

Weighted-Average Cost of Capital

The discount rate used in the CLV model is the weighted average cost of capital (WACC).

The cost of capital for a company is defined as the opportunity cost of capital for the

company’s existing assets. It is used in finance to value new assets that have the same risk as

the old ones. Therefore weighted-average cost of capital is a method of assessing the company

cost of capital and it also incorporates an adjustment for the taxes a company saves when it

borrows (Brealey et al., 2004). This means that WACC is the “expected rate of return on a

portfolio of all the firm’s securities, adjusted for tax savings due to interest payments.”

(Brealey et al., 2004, p.325). This measurement is recommended to be used in calculating

CLV (Ryals and Knox, 2007). Each category of capital has to be proportionately weighted to

attain the WACC. Included in the calculation of WACC are all capital sources (e.g. bonds,

common stock, preferred stock and any other long-term debt). It is calculated by multiplying

13

the cost of each capital component by its proportional weight and then summing (Brealey et

al., 2004). The equation is as follows:

(2-2)

where,

D market value of the company’s debt

E market value of the company’s equity

V E+D

D/V percentage of financing that is debt

E/V percentage of financing that is equity

Rd cost of debt

Re cost of equity

Tc corporate tax rate

2.1.4 Retention rate (1-Churn)

Retention rate is the third and last element in the CLV model used in this research. The

retention rate is the probability of a customer being “alive” or staying with a company. This is

the same as 1-churn which is one of the key elements to calculate CLV. Therefore, it is

important to have accurate predictions of churn probabilities, especially if CLV is to be used

for allocating marketing resources (Risselada et al., 2010). Customer churn, which is the

propensity of customers to cease doing business with a company in a given time period, has

become a significant problem for many companies (Neslin et al., 2006). Wei and Chiu (2002)

describe subscribers churning in mobile phone telecommunications as subscribers transferring

from one telecommunications company to another. Customers often churn from one company

to another, searching for better rates or services. Corporations in the United States of America

loose on average half of their customers every five years. Most of these corporations have

little insight into why customers defect and can therefore do little or nothing about it. They do

not measure customer defections, make little attempt to prevent them from defecting and do

not use the defections as a guide for improvements. By examining the cause of customer

defections, companies can detect business practices that need to be dealt with and even,

sometimes win back lost customers and reestablish the relationship on firmer ground

(Reichheld, 1996). Companies have conventionally given the most attention to acquire

customers, both those that have never bought the product before or are presently customers at

a competitor. Many companies have now started focusing on customer retention, where they

design their strategies to hold on to their current customers (Winer, 2001).

14

In the telecommunications industry, churn refers to subscribers moving from one

company to another. Subscribers tend to look for better rates or services so many of them

churn recurrently, going between providers (Wei and Chiu, 2002). Customer churn is directly

incorporated in how long a customer stays with a company and has an influence on the

creation of future profit for a company and therefore also in the customer’s lifetime value to

that company. It is therefore very important to take into account in the CLV model (Neslin et

al., 2006; Hwang et al., 2004). Wheaton (2000) wrote in his article about CLV for bank

customers that it is more profitable for a company to retain a mature, high-balance account

than to acquire a new account that is lower-balance. The new ones tend to be more prone to be

lost in the first few years. The customers who churn accounts every few years are more likely

to be younger, less-established households, and buy fewer products from the company. This is

in line to what is happening at the telecom. Those customers who have subscribed in the last

few years are more likely to churn and go elsewhere.

Untargeted and targeted approaches are the two basic approaches to manage customer

churn. The untargeted approaches rely on a superior product and mass advertising to retain

customers and improve brand loyalty. With targeted approaches however, the customers who

are likely to churn must be detected. They should be provided with either a direct incentive or

customized service plan to stay with the company. An example would be to segment their

telecommunications calling behavior and provide them with market competitive service plans

(Neslin et al., 2006). There are two types of targeted approaches, reactive and proactive. With

the former type, the company does not do anything until the customer makes a contact to

cancel his account. Then the company makes the customer an offer to stay. With the latter

type, the company first attempts to identify the customers who are likely to churn in the

future. These customers are then targeted with special programs or incentives to prevent them

from churning. Targeted proactive programs therefore have the possible advantages of lower

incentive costs and the customers who are at risk of churning will not get accustomed to

negotiating for better deals in order to stay with the company as they would with a reactive

approach (Neslin et al., 2006). Reichheld (1996) argues that a remarkable increase in profits

could derive from small increases in customer retention rates. A company that manages to

retain 5% more customers can improve the bottom line by 25-80%. And the increase of

customer retention by just 2% has the same effect as a cost reduction of 10% (Roofthooft,

2010).

15

2.1.4.1 Customer Churn Determinants

As with the CLV, there are several factors that have an influence on the churn rate. These

factors will now be discussed and hypotheses formulated. The hypotheses are stated for post-

paid and pre-paid customers separately as first of all, there are different features for either

type of subscription. There is for example information on the number of products or services

bought by post-paid customers and the amount and frequency of refill for the pre-paid

customers. Another reason is that, as shown in section 1.1.1, there has been much more churn

among pre-paid customers at the telecom as its market share has decreased significantly in the

last few years. Therefore, pre-paid customers will probably have higher predicted probability

of churn which then leads to lower CLV. Pre-paid customers most likely have lower margin

or ARPU as one can imagine they use their phone as little as possible to save their pre-paid

credit or have friends within the same network which they can call for free.

Customer Satisfaction, loyalty and relationship length

Whether customers are satisfied with a company or not hinges on how they evaluate the

overall experience of their purchase and consumption and also on how the customers perceive

the quality of the services. It has become known, along with loyalty, as a strong predictor of

customer churn (Eshghi et al., 2007; Seo et al., 2008). Satisfaction has been shown to be a

strong predictor of loyalty, especially in the service sector, including wireless service

providers (Gerpott et al., 2001; Kim and Yoon, 2004). This emphasizes the significance of

both customer satisfaction and loyalty to companies’ survival and growth in the long-term

(Edvardsson, et al., 2000; Eshghi et al., 2007). Satisfaction of mobile phone customers can be

related to several factors, one of which is the length of the relationship between the customer

and the service provider. The longer the duration of the relationship, the more experience and

knowledge the customer has about the service provider. This means higher switching costs

because if customers switch service provider, they have to give up their familiarity with the

provider’s features and have to adapt to different features with the new provider (Seo et al.,

2008). Longer customer relationships also indicate greater customer satisfaction (Reinartz and

Kumar, 2003). As customers get accustomed to the service offered by the provider and know

what they can expect, they get more satisfied than they would be with an unfamiliar provider

in a new relationship (Bolton, 1998). Therefore, the following hypothesis is concluded:

H1: Length of customer relationship has a negative effect on (a) post-paid and (b) pre-

paid customer churn probability.

16

Level of Service Usage

Monthly charge, unpaid balances, number of calls, and minutes of monthly use are some of

the service usage factors that have been used in previous studies (Keramati and Ardabili,

2011). These factors will be used in this research along with number of text messages sent as

a measure of the level of usage by each customer. Ahn et al. (2006) showed that usage is

positively related to churn, meaning that heavy users are more likely to churn. Therefore the

following hypothesis is stated related to the level of usage:

H2: Level of usage has a positive effect on (a) post-paid and (b) pre-paid customer churn

probability.

Customer Demographics

The customer demographic variables taken into account are age, gender, marital status, and

geographic area of residence. It is not quite clear how these demographics are related to

customer churn probability. As mentioned earlier, Wheaton (2000) suggested that younger

customers are more likely to churn than older ones. At the telecom, younger customers might

be following either their friends who move to another telecom or they are less loyal and tend

to take lower offers when they can or follow new trends. A study by Seo et al. (2008) showed

that older customers are more likely to stay with the same provider so the following

hypothesis is stated:

H3: Age has a negative effect on (a) post-paid and (b) pre-paid customer churn

probability.

2.2 Segmentation

In marketing, a segment is a significant concept. Segmentation has become more efficient

with the development of database marketing techniques, along with CLV and churn

prediction. There are many ways of segmenting the customer database but companies can

segment it based on CLV, where the customer base is sorted into descending order by value

and then the base is split into ten equal segments. The most profitable customers are in one

segment (usually the top 10%), the second highest group of customers in another segment (the

next 10%) and so on until there is a segment with the most unprofitable customers. A segment

represents a set of customers who will be treated as one unit for planning, carrying out and

inspecting the results of marketing campaigns. A segment is generally considered to be

“homogeneous”, meaning that the customers in it are similar, at least for the examination of a

property or the planning of a campaign (Rosset et al., 2003).

17

When the CLV has been calculated for the customers, companies can aggregate the

customers to almost any number of discrete segments which can then be used for example to

develop acquisition or retention strategies that are relevant and cost effective. Companies that

have a large number of customers with small sales to each customer could benefit from

models that help segmenting the customer base based on customer lifetime (Jain and Singh,

2002; Kumar and Shah, 2009). Segments with customers who have medium but stable

profitability could add a higher potential value to the company than customers who are highly

profitable but have a high risk of churning in the future.

Marketers are interested in the differences between consumers, which can vary

considerably. These differences can be based on, amongst other factors, geography,

demographics, personality, lifestyle, psychographics, behavior, decision-making processes,

purchasing approaches and situation factors. The fact that these differences exist makes it

important for a company to develop market segmentation strategies as it is believed to be

more profitable to treat specific types of customers in differing ways rather than treating them

all the same. The customers with a mobile phone subscription at the telecom will be

segmented by separating them in ten deciles based on their individual CLV. The 1st decile

includes the top 10% most valuable customers at the telecom according to their CLV and the

10th

decile includes the 10% of the least valuable customers.

2.3 Conceptual Model

The conceptual model (see Figure 2-1) is built on the hypotheses in the previous section.

H1

H2

H3

Figure 2-1: Conceptual model of the Customer Lifetime Value

Customer

satisfaction

Level of service

usage

Age

Discount

rate

Churn rate

ARPU

Customer

lifetime

value

(CLV)

Segmentation

18

The conceptual model shows how customer satisfaction, loyalty and length of relationship,

level of service usage and customer age affect churn. The margin (ARPU), discount rate

(WACC) and churn rate are then used to calculate the customer lifetime value for the

telecom’s customers, which in turn can then be used to segment the customer database.

2.4 Summary

This chapter describes the situation in the telecommunications industry in Iceland and the

harsh competition in this industry with the arrival of new competitors. The main concept of

the thesis, Customer Lifetime Value, is also covered in this chapter. This concept has been the

focus of many companies in the service industry all over the world and is getting increasing

attention. Companies seek to find out which of their customers have the most value for them

and can then use that information to custom their product selection to the customers’ wants

and needs or to retain those customers which are in most danger of churning.

The CLV model used in this thesis is outlined and its elements explained. Customer

churn is the most important part of this model but at the same time the most difficult to

calculate. There are several determinants of churn which have either a positive or negative

influence on the churn rate and hypotheses are formulated about the determinants. CLV can

then be used to segment the customers for a better overview of the most valuable customers.

Finally, the conceptual model is defined.

19

3. Methodology

3.1 Research Design

The research design and related important issues are discussed in this chapter. This research is

quantitative as hypotheses formulated in the previous chapter will be tested with numerical

data from a customer database owned by the telecom. The sample is described shortly along

with the variables in the analysis. Section 3.4 describes the plan of the analysis, where the

classification methods used for the churn analysis are explained.

3.2 Sample

The objective was to attain a sample that consists of mobile phone customers at the telecom

that is heterogeneous in terms of gender and age. There are two datasets constructed for the

quantitative analysis, one consisted of just over 33000 randomly chosen customers with a

post-paid mobile phone subscription (subscription paid at the end of the month) at the telecom

and the other consisted of around 22000 customers with a pre-paid mobile phone subscription

(where customers have to buy recharges when the previous runs out).

The data used for this research is panel data containing usage histories of mobile

phone subscriptions. The datasets were based on the customer database and call log provided

by the telecom and are monthly aggregated. The sample data set is divided into two parts,

training, and validation or testing sets, before executing the analysis. The models are first

developed on the training set and then the probability models are validated by using the

equation on the testing set. For the post-paid sample, a training sample of 4379 customers was

obtained and a testing sample of 28737 customers. For the pre-paid sample, a training sample

of 5995 customers was obtained and a testing sample of 15906 customers. The samples are

given in more details in Section 4.1.

3.3 Variables

There are several variables in the dataset. The dependent variable in the churn analysis is

churn probability. For post-paid customers, a customer is defined as a churner when he or she

switches telecoms. For pre-paid customers, a customer is defined as a churner when he or she

switches telecoms or has not used the number or made a refill for three consecutive months.

The independent variables are related to the mobile phone customers and can be divided into

20

five categories, customer demographics, billing data, refill history (applies to pre-paid

customers only), calling pattern, and call detail records billed (dcr billed). A list of these

variables can be seen in Table I-1 in Appendix I. The data did not include any previous

targeted marketing efforts or information about competition efforts. Various demographic

variables will be used as control variables in the analysis to see whether they have an effect on

churn or not. These variables include gender, marital status, family size and rate plan among

others. This is discussed further in Section 4.3.

3.4 Plan of Analysis

As discussed in the previous chapter, the individual CLV model consists of three elements.

These are the discount rate, the margin/profit and the churn probabilities. The methods used to

calculate these elements will be discussed in the following sections. The margin (ARPU) is

explained first, then the discount rate. The churn model is discussed last and the two

classification methods used to predict churn. The method for calculating the CLV is discussed

shortly and finally the method for segmenting the mobile phone customer base. The analyses

of data in this research were processed using the Statistical Package for the Social Science

(SPSS 19).

The mobile telecommunications market is divided into business and residential

customers. For this research, the business customers are excluded given that they primarily

use mobile services to earn income and they usually do not decide themselves whether to sign

or extend a subscription contract. Since there is much less available information about pre-

paid customers than post-paid customers, there will be separate analyses for these two groups.

Pre-paid customers are not required to give up their name or any other personal information

so usually the available information is restricted to customer behavior like mobile phone

usage.

3.4.1 Average Revenue per User (ARPU)

As stated in Section 2.2.2, ARPU is calculated each month. For this research, the ARPU is

calculated by summing up the total charges paid by a customer over the three month

observation period and divided by three. This is done for both the post-paid and pre-paid

samples.

21

3.4.2 The Discount Rate (WACC)

The most recent figure for the weighted average cost of capital (WACC) at Telecom X will be

used for the calculation of CLV.

3.4.3 Churn Analysis

Two methods will be used to predict customer churn, logistic regression and classification

trees. These methods have both been widely studied and have good predictive performance

(Neslin et al., 2006; Risselada et al., 2010).

Logistic Regression

Binomial logistic regression was conducted to test the hypotheses formed in chapter 2. This

type of regression has been broadly used and examined in predictive data mining to predict

customer churn in various trades like retail industry, financial services and

telecommunications (Samimi and Aghaie, 2011). This method is chosen since the target

variable, customer churn, is not continuous but discrete or categorical (churn or not churn).

The effect of direct factors (i.e., subscription length, amount of charge, number of calls) on

customer churn can be examined with this method. The customers who are going to churn can

be discovered with the logistic regression and also what the drivers of churn are. The model

was estimated using a fixed set of variables from the dataset as described in section 3.3 above.

The logistic regression is conducted to examine the relationship between the customer

churn which is entered into the model as the dependent variable and the other factors

(including subscription length, amount of charge and number of calls) which were entered as

the independent variables. The basic model for the logistic model can be written as:

(3-1)

where churn is customer churn (a binary class label {0,1}), x is the input data, and the

parameters β0 (intercept) and β1 to βm are estimated with the maximum likelihood (ML)

estimation which is the only method to use for individual level data (Allison, 1999). The

probability of a customer churning increases by the amount that is determined by Equation (3-

1) with a unit increase in the independent variable when the coefficient for the independent

variable is positive. Maximum likelihood estimators have good properties in large samples

and are consistent. This means that the probability that the estimate is close to the true value

22

grows as the sample size gets larger. ML also handles well with data with categorical

dependent variables as in this case (Allison, 1999). The Wald chi-square statistic is used to

test the significance of the individual coefficients that are obtained through the maximum

likelihood estimation (Allison, 1999).

Decision Trees

One machine-learning method that can be used for constructing prediction models from data

are classification trees, also called decision trees. The prediction models are achieved by

partitioning the data space repeatedly and fitting a simple prediction model within each

partition. It is then possible to represent the partitioning graphically as a decision tree (Loh,

2011). Decision trees have attracted great attention from both researchers and practitioners

and have become the most popular data mining tools among managers because of its practical

use (Neslin et al., 2006). The decision tree splits the customer dataset successfully into

mutually exclusive discrete subsets and each customer is assigned to one subset or the other.

(Risslelada et al., 2010). It is an intuitive and easy-to-implement predictive modeling

technique. The trees are a sequence of criteria for classifying customers according to metrics

such as likelihood of churn. The pictorial visualization of a decision tree makes it easy to

operate and communicate (Witten and Frank, 2005). The purpose is to build a tree so that the

values of a categorical dependent variable (churn in this instance) can be predicted based on

the values of the continuous and/or categorical independent variables. The decision tree

algorithms create groups that consist of individuals based on a criterion which is selected for

splitting a group. The groups are called nodes which form a branching node tree. The

dependent variable is at the top of the tree and is the root node. It consists of all cases in the

sample. Each node in the tree can be split into two nodes, called child nodes. The original

node is then the parent node. This partitioning process can be employed repeatedly where

each child node can be split in two. If a node has no child nodes, it is called a terminal node or

a leaf (Harper and Winslett, 2006). An example of a decision tree for churn is shown in Figure

3-1 on the next page. Churn is the root node and the tree splits the customers in the sample in

three groups (nodes 2, 3 and 4). Those customers who are in a family of two or more people

or those who are single males are more likely to be active customers. However single female

customers are more likely to churn.

23

Churn

Family size

1 person 2 people; > 2 people

Gender

Female Male

Figure 3-1: An example of a decision tree for churn

Decision trees can be used for segmentation where people are identified as being

members of a specific group, or for prediction where rules are formed and used to predict

future events like churn, like with the logistic regression. They can also be used to reduce data

and for variable screening where useful subsets of predictor variables are selected from a

larger set of variables. The dependent and independent variables used in creating decision

trees can be nominal, ordinal or scale. There are four methods in SPSS that can be used to

grow the decision trees:

CHAID, which stands for Chi-squared Automatic Interaction Detection where the independent

variable which has the strongest interaction with the dependent variable is chosen.

Exhaustive CHAID, which is a modification of CHAID. It inspects all possible splits for each

predictor or independent variable.

CRT, which stands for Classification and Regression Trees. It splits the data into homogeneous

segments in concern with the dependent variable. The classification tree is generated by using the

Gini index of diversity to choose the best splitting decision for the nodes.

QUEST, which stands for Quick, Unbiased, Efficient Statistical Tree. This method is fast and

evades other method’s bias in support of predictors that have many categories. It can only be

specified if the dependent variable is nominal.

With both the CRT and QUEST methods, a tree can be pruned to decrease the level of

complexity of the tree’s structural design and to avoid overfitting the model. A tree is grown

until the stopping criteria are met. The tree is then trimmed automatically to the smallest

subtree based on the specified maximum difference in risk.

The advantage that decision trees have over other classification methods, including

logistic regression, is that there are no assumptions made regarding the distribution of the

independent variables. They can therefore deal with data that is highly skewed along with

Node 0

Node 1 Node 2

Classification: censoring

Node 3

Classification: churn

Node 4

Classification: censoring

24

categorical independent variables with ordinal or non-ordinal structure. This reduces the time

spent on analysis and the trees are fairly simple to interpret.

3.4.3.1 Model Performance Evaluation

There are several ways to evaluate the performance of a prediction model. Two methods were

used in this analysis, confusion matrix and ROC curve. They are described below.

Confusion Matrix

The classification methods (e.g. logistic regression, decision tree) used produce “raw data”

during testing which are counts of correct and incorrect classifications from each class. This

information can then be presented in a confusion matrix which is a form of contingency table

that illustrates the differences between the true and predicted classes for a set of labeled

examples. A confusion matrix is shown in Table 3-1. It has four possible outcomes, where Tp

and Tn are the number of true positives (a case is positive and classified as positive) and true

negatives (a case is negative and classified as negative) respectively. Fp (also Type I error) are

numbers of false positives, where a case is negative and classified as positive. Fn (also Type II

error) are the number of false negatives, where a case is positive but classified as negative. Cn

and Cp are the row totals and are the number of truly negative and positive examples. Rn and

Rp are the number of predicted negative and positive examples and N is the overall accuracy

(Bradley (1997), Fawcett (2006)).

Table 3-1: Confusion matrix

Predicted class

negative positive

Observed negative Tn Fp Cn

class positive Fn Tp Cp

Rn Rp N

Some significant information can be extracted from the table to illustrate certain

performance criteria.

Positive predictive value (also called hit rate or recall) is the proportion of positive instances

which were classified correctly =

, where Rp = Fp + Tp (3-2)

25

The false positive value (also called false alarm rate) is the proportion of negative instances

which were classified incorrectly as positive =

(3-3)

Negative predictive value is the proportion of negative instances which were classified

correctly =

where Rn = Fn + Tn (3-4)

The false negative value is the proportion of positive instances which were classified

incorrectly as negative =

(3-5)

Sensitivity =

, where Cp = Tp + Fn (3-6)

Specificity =

, where Cn = Tn + Fp (3-7)

N (Overall accuracy) =

or =

(3-8)

In the case of customers at the telecom, those who are in the true positive category are

those who churned and correctly classified as churners. Those in the false positive category

were non-churners classified as churners and those in the false negative category were

churners incorrectly classified as non-churners. Customers in the true negative category were

non-churners correctly classified as non-churners. Sensitivity indicates the model’s capability

to identify positive results (churn). It is the probability of a customer being predicted as

churner, given that the customer has churned. Specificity indicates the model’s capability to

identify negative results (non-churn). This is the probability of a customer being predicted as

non-churner, given that the customer has not churned. A model with high sensitivity has a low

type II error rate while a model with high specificity has a low error I rate. Sensitivity and

specificity are also terms associated with ROC curves which will be discussed next.

The ROC Curve

The Receiver Operating Characteristic (ROC) curve is a helpful technique to visualize the

performance of a classification method (e.g. logistic regression) with the intention to select a

fitting operation point, or decision threshold. A figure is obtained by plotting the performance

of a binary classifier system while the discrimination threshold (cutoff point) is varied. It is a

cross-validated estimate of the classification method’s overall accuracy (probability of a

correct response) (Bradley, 1997). If, for example, the threshold is changed from .5 (the

26

default threshold) to .7, the model will predict fewer positive predictions. The ROC curve

then symbolizes all possible combinations of values in the confusion matrix and it can be used

to find the probability threshold which yields the highest overall accuracy for the model.

ROC graphs are two-dimensional graphs where true positives (Sensitivity) are plotted

on the Y axis and false negatives (1-specificity) are plotted on the X axis. This graph

represents tradeoffs between benefits (True positives) and costs (False positives). An example

of a ROC curve is showed in Figure 3-2. The ideal diagnostic test would be in the top left

corner (0,1) where 100% sensitivity and 100% specificity are demonstrated. At this point, all

positive and negative cases are correctly classified. At the point in the lower left corner, all

cases are classified as negative and in the upper right corner, all cases are classified as

positive. The cutoff point for the prediction model can be adjusted either to increase the Tp but

at the cost of increasing Fp or decreasing Fp at the cost of decreasing Tp.

Figure 3-2: An example of a ROC curve (Source: Deshpande, 2011)

The diagonal line y = x (blue line) portrays a model which randomly guesses the class

(churn or non-churn) and the red line represents the results from the classifier/model. Any test

results that are above the diagonal line would be better than random, results below the line

would give poor results.

Area under an ROC curve (AUC)

A customary method to compare classifiers is to calculate the area under the ROC curve

(AUC). Its value will always be between 0 and 1.0. The AUC of a classifier corresponds to

the probability that the classifier will rank a positive case chosen randomly higher than a

negative case which is randomly chosen (Fawcett, 2006). The accuracy of a classifier is

measured by the AUC where 1 depicts a perfect test and an area of .5 depicts a test that is

valueless.

27

3.4.3.2 Model Validation

The logistic regression models and decision models created for post- and pre-paid training

samples will be validated on separate testing samples which are unbalanced to replicate real

world data.

3.4.4 The CLV Calculation

After estimating the average revenue per user, the discount rate and the churn for individual

customers in the data sets, the next step is to use these results and estimate the CLV for each

customer by using model (2-1) discussed in section 2.1.1.

3.4.5 Segmentation

As stated in Section 2.2, the customer database with post-paid and pre-paid subscription at

Telecom X will be segmented based on their individual CLV. There will be ten segments for

each type of subscription, which are of equal size. The 1st decile consists of the least valuable

customers while the most valuable customers are in the 10th

decile. These deciles will be

described to give some insight about what the customers within the deciles have in common.

3.5 Summary

In this chapter, the research design of the thesis was discussed. Next the data sets collected at

the telecom was described shortly, followed by the plan of analysis where the calculation of

ARPU and WACC was explained. Churn analysis was described, where the two methods

used, logistic regression and decision trees were outlined. The models will be evaluated for

prediction accuracy and validated on separated data sets. Then the calculation of CLV was

shown and finally the segmentation was covered.

28

4. Data preparation

To be able to make a churn model, it is essential to have the right data. The data should be

information about demographics, revenue and call detail records. The telecom has a data

warehouse which stores the necessary data that is required to make a churn model. Section 4.1

discusses the sampling and difficulty with skewness in churn data. Section 4.2 explains the

time aspect of the data extraction and Section 4.3 shows the categorization of the features that

the database encompasses.

4.1. Sampling

There are practical problems related to churn modeling. In a company that offers continuous

service, such as a telecom or a bank, the percentage of those who defect will always be

somewhat small in any time period. Therefore, a sample from the general population of

customers will only acquire a comparatively small number of defectors, even if the sample is

large. That consequentially means that it is difficult to reliably distinguish between churn

(rare events) and non-churn (Rust and Metters, 1996). To deal with this problem, some

authors have emphasized that the training set, which is used to estimate the model is a

balanced sample which means that it consists of equal numbers of churners and non-churners

(Rust and Metters, 1996; Coussement and Van den Poel, 2008). This means under-sampling,

where cases which belong to the majority class (here, non-churn) are discarded until there are

even numbers of both classes. The distribution of the data used in the training and testing sets

for modeling churn for both post-paid subscriptions and pre-paid subscriptions is shown in

Table 4.1.

Table 4-1: Distribution of the data used in the training and testing sets

Pre-paid subscription

Training dataset

Number of customers who churned 2922

Number of customers still active 3073

Testing dataset

Number of customers who churned 1016

Number of customers still active 14890

The training set for post-paid customers consists of 2190 churners and 2189 non-

churners. The testing set consists of 827 (2.9%) churners and 27910 non-churners. The

training sample for pre-paid customers has a total of 8469 in the training sample and almost

38000 in the testing sample. One of the disadvantages with the pre-paid sample though, is the

Post-paid subscription

Training dataset

Number of customers who churned 2190

Number of customers still active 2189

Testing dataset

Number of customers who churned 827

Number of customers still active 27910

29

proportion of missing data. Since customers with a pre-paid subscription do not need to

submit personal demographic information about themselves, demographic variables are those

with the most missing data or up to 40%. To find out if there is a significant difference

between those customers with demographic information and for those without it, independent-

samples t-tests were done with all the independent continuous variables. The results were that

the means for the independent variables for the two groups were not the same, except for 1

variable, “Average voice outin volume ratio”. Since there is a difference between these two

groups, it is likely that it will be necessary to make a separate logistic regression model for the

two groups as it is always difficult to fill in missing values, especially for those variables with

only two groups like “Gender”. The decision was therefore taken to exclude those customers

with no demographic information in both the training and testing sets. The final training set

consisted of a total of 5995 customers, 2922 who churned and 3073 which did not. The testing

dataset consisted of 1016 churners (6.4%) and 14890 who were still active. The proportion of

those who churned with respect to those who did not is higher in the two data sets combined

for pre-paid customers, with 17.98% churners. In the two combined post-paid datasets, there

are 9.11% churners. One possible reason for this difference could be that it is easier for pre-

paid customers to terminate their subscription at the telecom and they could therefore be more

prone to follow a better offer at another telecom. Average age is lower for the pre-paid data

sets (37 years in the training set and 40 years in the testing set) than for the post-paid data sets

(49 years in the training set and 52 years in the testing set). This could be a signal of younger

people being less loyal than older people.

4.2. The time aspect

To select the data needed for predicting churn, a time window is used that consists of an

observation period where features for each customers is extracted and a performance period

where customers are labeled as churn or non-churn (see Figure 4-1 on the next page). The

length of the time frame used for analysis can vary and depends on the industry under

inspection. The observation period was set for three months where monthly aggregated

transaction activity and other information were gathered for each customer. Kumar et al.

(2007) found that the optimal performance period was three months for a telecom company

but for this research a performance period of five months was used where the customers were

followed. The customers are then labeled either as churn or censoring (non-churn) since the

timeline is censored, what happens after the performance period is unknown. Customer A in

30

Figure 4-1 churned during the performance period and therefore is classified as churn (1).

Customer B was still active at the end of the performance period and is classified as censoring

(0) (Nie et al., 2011).

Observation period Performance period

Feature extraction Class labeling

T1 T2 T3

Customer A Churn

Customer B Censoring

Figure 4-1: The time window of the analysis

All of the customers included in the data set are active at the beginning of the

performance period. A longer performance period was used to collect as many churners as

possible. Because churn is a rare event in the customer database for the telecom, two different

performance periods of five months were used, the first from 1 July 2010 to 1 December 2010

(interval between T2 and T3) with a observation period from April to June 2010 (interval

between T1 and T2). The second performance period was from 1 December 2010 to 1 May

2011 with observation period from September to November 2010. The first and half of the

second data set were used to create a balanced training set and the other half of the second

data set was used to create the testing set used for validation of the classification models

without under-sampling so that it reflects real world data which has a highly skewed class

distribution.

4.3. Independent variables

The mobile phone company has a large data warehouse from which the data needed for the

analysis in this research can be extracted. Based on previous research in this field, the

customer data that will be used to predict churn can be divided in three main descriptor

categories that include the input of prospective explanatory descriptors. These descriptors are,

as previously said, shown in Appendix I and are personal demographics, revenue and

customer behavior (Xie et al. 2009). For this research, the descriptors have been categorized

in a little more detail as follows:

Y = 1

Y = 0

31

1. Demographics are personal data of a given costumer, such as age, gender, marital status, place

of residence, family size, rate plan, whether or not the customer is the registered payer and

tenure which is the number of days a customer was or is active and finally customer status

which says whether or not he/she churned or not. The variable “Rate plan” had up to 20

different categories where some categories had many cases and other categories had very few

cases. For this reason, it was impossible to use this variable and it was removed from the

analysis.

2. Billing data shows the number of billed services, number of billed products, billed amount due

to mobile phone usage, discount amount a customer receives, total billed amount and ratio of

both mobile usage and discount versus total billed amount. These variables apply only to post-

paid customers.

3. Refill history applies only to customers with pre-paid subscriptions. These descriptors are

refill frequency and amount and total refill frequency and amount.

4. Calling pattern are descriptors created for both post-paid and pre-paid customers. They are

related to inside and outside network and abroad call volume and frequency, total originating

and terminating call volume and frequency, total sent and received text messages, ratio of

inside/outside network and abroad calls versus total originating call volume/frequency, ratio of

originating calls versus terminating call volume and ratio of text messages sent versus text

messages received.

5. CDR (call detail records) billed are descriptors of charged amount due to inside/outside

network and abroad calls, ratio of inside/outside network and abroad calls versus total charged

amount, charged amount due to text messages sent inside/outside network and abroad, ratio of

inside/outside network and abroad text messages sent versus total charged amount and then

total charged amount. These are also created for both types of subscriptions.

Besides the above mentioned descriptors, there are also derivatives such as the maximum

value over the three months and average value over the three months. These features are also

all listed in Appendix I. The demographic features are extracted at the beginning of the

observation period but the features in the other categories are extracted for each of the three

months of the observation period.

4.5 Summary

In this chapter, data preparation has been detailed. There was a large set of information to go

through for both pre-paid and post-paid customers and therefore necessary to examine them

well before the main analysis. A sizeable proportion of missing values existed in the pre-paid

datasets and as there was a difference between customers who submitted demographic

information and those who did not, cases with missing values were excluded.

32

The time aspect of the research is then outlined in the next section. Two time windows

of eight months each were used in the research, where the observation period was three

months and the performance period was five months. Finally, further details are given about

the independent variables which are numerous in the data sets related to demographics, billing

data, refill history, calling pattern and call detail records.

33

5. Results

The results from the analysis are presented in this chapter. Firstly, the sample with the post-

paid customers will be described in Section 5.1 and the results shown, both for the training

and testing samples. Then results for the pre-paid customers are presented in Section 5.2, both

for the training and testing samples. The outcomes of the hypotheses presented in chapter 2

are discussed in Section 5.3. Finally, the results from the CLV calculations and the

segmentation are presented and discussed in Section 5.3.

5.1 Post-paid customers

5.1.1 Sample description

In this chapter the sample of post-paid customers at the telecom is analyzed. These are

customers who receive their bill at the end of each month. The sample used to training the

churn model consisted of 2190 churners and 2189 non-churners or total of 4379 post-paid

customers.

Since non-churners are still active customers with the telecom, it is impossible to

calculate mean values for the independent variables such as tenure as this will continue to an

unknown date in the future. Therefore the mean values are calculated for the time period that

is used to extract the data. The mean for the customers’ age was 49.19 years (with a SD =

14.828). The youngest customer is 18 years of age and the oldest customer is 98 years of age.

There were more males than females in this sample or 2488 (56.8%) and 1891 respectively.

Table 5-1 shows the marital status of the customers. Most of them are married/in a registered

partnership (53.2%). 28.9% are unmarried, 11.8% are either divorced or separated and 5.0%

widowed. Customers with an unknown status were 0.6% of the sample.

Table 5-1: Marital status of customers in the post-paid training sample

Marital status Frequency Percent Cumulative %

Married/registered partnership 2328 53.2 53.2

Unmarried 1265 28.9 82.1

Divorced 441 10.1 92.2

Widowed 218 5.0 97.2

Separated

Other

Marital status unknown

73

26

28

1.7

0.6

0.6

98.9

99.5

100.0

Total 4379 100.0

Two categories “Married (not living together)” and “Icelander living abroad” were

combined into 1 category “Other” since there were so few in each category. This category

34

consists of 26 customers or 0.6%. Regarding the customers’ family size, which is shown in

Table 5-2, most customers in the sample were single individuals or 3244 (74.1%). 922 were in

a family consisting of 2 people, 169 in a family of 3 people, 35 were in a family of 4 people

and 9 were in a family of 5 people or more. Because there is a large difference in the

frequencies in the first two categories and the last three, these last three categories where

combined into one. The third category includes 213 customers or 4.9% of the sample.

Table 5-2: Family size of customers in the post-paid training sample

Family size Frequency Percent Cumulative %

1 person 3244 74.1 74.1

2 people 922 21.1 95.1

3 people or more 213 4.9 100.0

Total 4379 100.0

The customers were fairly dispersed over the country, considering that 2/3 of the

Icelandic population lives within the greater capital area of Reykjavik. Table 5-3 shows the

distribution of the sample over the country. The majority of the customers live within the

greater capital area, or 2516 (57.4%) followed by 668 customers who live in the Southern part

Table 5-3: Residence of customers in the post-paid training sample

Land area Frequency Percent Cumulative %

Capital area 2516 57.5 57.5

Western Iceland 365 8.3 65.8

Northern Iceland

Eastern Iceland

Southern Iceland

Unknown

592

177

668

61

13.5

4.0

15.3

1.4

79.3

83.4

98.6

100.0

Total 4379 100.0

of Iceland. 61 customers have an unknown location. The mean for tenure, which refers to the

amount of days that an individual was or has been a customer, was 2177 days (SD = 1752.5).

The maximum number of days was 5917 (approximately 16 years) and the minimum number

of days was 31. The average number of various additional services (besides the basic post-

paid service itself) offered by the telecom that customers bought, was 3 (SD = 2.54). The

maximum number of services bought was 59 and some customers bought no additional

services. For the number of various products bought, the average was 29.8 (SD = 21.4), where

the maximum number of products bought was 149.7 and some customers bought none.

Examples of services and products are any additional services or products customers can add

to their subscription like mobile internet, calling friends for free, internet at home, land line

and television.

35

Table 5-4 shows the results from a chi-square test for independence. This test was

done to explore the relationship between customers’ status (churn or non-churn) and other

categorical variables. This table shows how many females and males churned or 991 (52.4%

of the females) and 1199 (48.2% of the males) respectively. For a 2x2 table like this, there can

be overestimation of the chi-square value but the Yates’ Correction for Continuity

compensates for that. The value is 7.467, with 1 degree of freedom (df) with an associated

significance level of .006 which is smaller than the alpha value of .05 (see Table II-1 in

Appendix II). The conclusion is made that the proportion of males who churn is significantly

different from the proportion of females who churn. However the value of Phi is -.042 (p =

.006) which indicates that the relationship between the two variables in the table is weak (see

Table II-2 in Appendix II).

Table 5-4: Crosstable of Status*Gender in the post-paid training sample Gender

Total Female Male

Status Non-churn Count 990 1289 2189

% within gender 47.6% 51.8% 50.0%

Churn Count 991 1199 2190

% within gender 52.4% 48.2% 50.0%

Total Count 1891 2488 4379

% within gender 100.0% 100.0% 100.0%

The same test was also done for family size, land area, whether or not the customer is

the payer, marital status and total charge groups. Since the other demographic variables,

except “Is payer”, have 3 or more categories, Cramer’s V is the appropriate statistic instead of

phi. For “Family size”, the Pearson Chi-square from Chi-Square tests is 14.454 with a p =

.001 (df = 2). This means that there is a significant difference in status (churn or non-churn)

and number of people in the family. The Cramer’s V is 0.057 (p = .001) which indicates a

weak relationship between the two variables. The Chi-square for “Land area” is 64.788 and p

= .001 (df = 5). The Cramer’s V is 0.122 (p = .001). So like with the former variables, there is

a difference in status and residence and the relationship is somewhat stronger. For “Marital

status”, the Pearson Chi-square is 90.961 with p = .001 (df = 6). The Cramer’s V is 0.144 (p =

.001). For “Total charge groups”, the Pearson Chi-square is 265.911 and p = .001 (df = 1).

The Cramer’s is 0.246 (p = .001). The variable “Is payer” was the only demographic variable

where the results were not significant. As this variable only has two categories, yes or no, the

results are a 2x2 table. Therefore the value for Continuity Correction from the Chi-Square

tests is used. This value is 1.531 with p = .248 (df = 1). The phi is 0.019 (p = .216). This

36

means that the proportion of those who churn is not significantly different from the proportion

of those who are still active.

To see if there was a significant difference in the mean of different continuous

variables for those who churned and those who did not, an independent samples t-test was

done (see Table II-3 in Appendix II). Out of the 42 variables regarding customer age, tenure

and averages, 34 of them had a significance level for the Levene’s test lower than .05

indicating that the variance of scores for the two groups (churners and non-churners) is not the

same. The variables where there was no significant difference in the variance were “Average

amount gsm”, “Average ratio gsm”, “Average abroad total charge ratio”, “Average text

message innet total charge ratio” and “Average text message abroad total charge ratio”. For

the t-test for equality of means, which says whether there is a significant difference between

those who churned and those who did not, eight variables had a significance value higher than

.05. For all the other variables, there is a significant difference in the mean values between the

two dependent groups (churn and non-churn). The eight insignificant variables were “Average

ratio gsm”, “Average abroad volume”, “Average abroad volume ratio”, “Average abroad

frequency ratio”, “Average voice outin volume ratio”, “Average abroad total charge ratio”,

“Average text message abroad charge” and “Average text message abroad total charge ratio”.

Of the 33 variables with maximum values, three of them had a significance level for the

Levene’s test higher than .05 indicating the variance of scores for churners and non-churners

were the same. These variables were “Maximum abroad total charge ratio”, Maximum text

messages innet total charge ratio” and “Maximum text messages abroad total charge ratio”.

Eight variables had a p > .05 for the t-test for equality of means so there was not a significant

difference in the mean values between the two dependent groups. These variables were

“Maximum voice outin volume ratio”, Maximum abroad volume”, Maximum abroad volume

ratio”, Maximum abroad frequency ratio”, “Maximum innet total charge ratio”, “Maximum

abroad total charge ratio”, “Maximum text messages abroad charge” and “Maximum text

messages abroad total charge ratio”.

Finally, to find out the effect size statistics, eta squared is calculated. This implies the

magnitude of the differences between the two status groups. The equation for eta squared is

Eta squared

(5-1)

37

where,

t the t-value from the t-test for equality of means

N number of individuals in each group

When the outcome of this formula is multiplied with 100, it can be expressed as a

percentage. The values of eta range from an extremely low value of 5.78*10-7

to 0.08294 so

the effect size is extremely small to medium. For the variable with the highest eta value,

“Average outnet frequency ratio”, 8.294% of the variance in that variable is explained by

customer status, for the other variables the percentage is less than that.

5.1.2 Multicollinearity

The presence of multicollinearity in the data can be a problem as it can affect the parameters

of the regression model. Correlation between any two independent variables should not be too

high (Field, 2009). To check whether multicollinearity is an issue in the data, one can look at

the tolerance and VIF (Variance Inflation Factor) statistics which are obtained from linear

regression using the same dependent and independent variables as in the logistic regression. A

tolerance value < 0.1 and a VIF value > 10 imply a serious collinearity problem (Field, 2009;

Menard, 2001). This procedure was followed by using all 82 independent variables, both

demographics and usage variables. 63 out of 66 variables, which were related to mobile phone

usage, had a tolerance value < 0.1 revealing that there is an issue with multicollinearity in the

data. However all of the demographics had a tolerance > 0.1. It is difficult to determine the

best manner of dealing with multicollinearity as it is impossible to know which variable

should be left out. One option would be to run a factor analysis on those variables involved in

the multicollinearity and use the factor scores as a predictor. Another option would be to

acknowledge the unreliability of the model (Field, 2009; Tabachnick and Fidell, 2001). The

decision was taken to run a principal component analysis on all 66 usage variables.

5.1.3 Principal component analysis

To see if the 63 variables which were involved in the multicollinearity form coherent subsets,

a Principal Component Analysis (PCA) with Varimax rotation was performed, but the

decision was made to use all 66 variables related to usage in the PCA. Those variables that

correlate with one another are combined into subsets or components which are independent of

other subsets. The PCA method is the only method where multicollinearity is not a problem

38

which is the reason why it was chosen for this analysis (Field, 2009; Tabachnick and Fidell,

2001).

As mentioned before, 66 variables were used in the PCA. These items were related to

usage and charges, both inside and outside the telecom’s network and abroad (see Table II-4

in Appendix II for the list of variables used). After the initial run, the Kaiser-Meyer-Olkin

measure confirmed the sampling adequacy for the analysis (KMO = .811) and the Bartlett’s

test of sphericity χ2 = 691523.602, with df = 2145 and p = .001. This implies that correlations

between the variables are large enough to conduct PCA (Field, 2009). Two variables,

“Average voice outin volume ratio” and “Maximum voice outin volume ratio”, had individual

KMO values < .5 (from the Anti-image Matrices) so the latter variable was removed. In the

second run with 65 variables, all variables had KMO values > .5 and their communalities

were from .572 and over which implies that they are all applicable for the analysis. According

to the Kaiser’s criterion of eigenvalues over 1, 12 components would be a suitable solution

and together they explained 87.766% of the variance in the data. The scree plot (shown in

Figure II-1 in Appendix II) showed a point of inflexion 2, 4, 5, 7, 11, 12, 16 and 22

components meaning that a solution with 1, 3, 4, 6, 10, 11, 15 or 21 components would be

appropriate. After extraction, the components are usually rotated in order to maximize high

correlations and minimize the low ones. The most frequently used method of rotation is

Varimax which is an orthogonal rotation. This method simplifies components by maximizing

the variance of factor loadings by making high loadings higher and low loadings lower for

each component (Agresti and Finlay, 1997). The analysis was run again, extracting 12

components, based on Kaiser’s criterion. To make the rotated component matrix easier to

interpret, coefficient values below .3 were suppressed. When the rotated component matrix

was checked, some items had only low loadings in two or more components (difference in

loadings over components less than .2) which made it difficult to decide in which component

to place these items. The analysis was then rerun, taking one variable out at a time until all

items loaded highly on only one component. By taking out variables, three components

consisted of only low loadings so the components were reduced to nine. The final solution

consisted of 53 items after omitting 13 items altogether which loaded on nine components.

5.1.3.1 Internal consistency reliability analysis

To assess reliability of a multi-item scale, the method that is mostly used is an evaluation of

the scale’s internal consistency reliability. The Cronbach’s alpha is an index which is a

39

method widely used and it should be > .7 to be acceptable (Polit, 2010). Table 5-5 shows the

results of the consistency reliability analysis for eight of the nine components. The first

column shows the number of the component, the second column shows the number of items

in each component and the third column shows the value for Cronbach’s alpha.

Table 5-5: Cronbach’s alpha for the components for the post-paid training sample Component N of items Cronbach’s

alpha

1 21 .911

2 6 .979

3 6 .963

4 5 .000

5 4 .653

6 4 .641

7 4 .571

8 2 .827

The Cronbach’s alpha for component 4 was .000 but would increase to .916 if the item

“Average abroad volume” was deleted. For component 5 it was .635 but increased to .714

when “Maximum text message outnet total charge ratio” was deleted. Component 6 had an

alpha = .641 but increased to .721 by deleting “Maximum text message innet total charge

ratio”. For component 7, the alpha was .571 and would not increase by deleting an item. It

was not possible to calculate the Cronbach’s alpha for the ninth component as it only consists

of one item. Cronbach’s alphas are then acceptable for all components except the seventh.

The alpha coefficients are essential indicators of the quality of the instrument so high

reliability is crucial to success in hypothesis testing (Polit, 2010). Therefore eight components

were used in the proceeding analysis as the one component was excluded due to low alpha.

5.1.3.2 Results from the PCA

After deleting the seven variables mentioned in the previous section, the PCA was rerun using

46 variables. The variables omitted from the analysis are shown in Table II-5 in Appendix II.

As mentioned above, Kaiser’s criterion of eigenvalues over 1 is often used to determine the

number of factors in PCA along with the scree plot. In this case, as shown in Table II-6 in

Appendix II, nine components had eigenvalues > 1. As previously stated, a solution with nine

or more components did not make sense as they would only include low loadings and the

Cronbach’s alpha was too low for one of the components. Therefore the decision was made to

continue with eight components and they explained 89.154% of the total variance in the data

(see Table II-6 in Appendix II). The KMO measure for this final analysis was .830 and the

Bartlett’s test of sphericity χ2 = 475859.216 (df = 1035, p = .001). Communalities and the

40

individual KMOs (item communalities) were all > .5. These values signify the amount of

variance in the item that is explained by the extracted components (Pett et al, 2003). As there

are no χ2

goodness-of-fit tests on hand for PCA, a comparable function can be executed by

examining the residual correlation matrix. This matrix is created by subtracting the

reproduced correlation matrix which is produced by the components from the actual

correlation matrix. The table “Reproduced Correlations” produced by SPSS showed residuals

that are the difference between actual and reproduced correlations as explained earlier. The

residuals give an indication about the goodness of fit for the analysis. 11% (156) of the

residuals had absolute values > .05 which is well below 50% of all the residuals. This

conclusion indicates that the extracted component solution represents a good fit for the data

(Pett et al, 2003).

The resulting components are shown in Table II-7 in Appendix II, which is the rotated

component matrix generated using the Varimax rotation. In general, the emerged component

structure was quite clear and easy to interpret. The first component is largest and accounted

for 36.296% of the variance. It had 21 items which can be seen in the table (only loadings

above .3 are shown to simplify it). It represents both average and maximum values (over the

five month observation period) of usage and charges both within and outside of the telecom’s

network and average and maximum total originating and terminating phone calls. The second

component had a variance of 17.059% and had six items. It represents average and maximum

ratios related to usage and charge outside the telecom’s network. The third component had a

variance of 9.138% and consisted of six variables with average and maximum ratios related to

usage and charge inside the telecom’s network. The fourth component had a variance of

7.821% and consisted of four variables related to usage abroad ratios. The fifth component

had a variance of 6.789% and consisted of three variables with charges for sending text

messages outside the telecom’s network. The sixth component had a variance of 3.775% and

consists of three variables with charges for sending text messages inside the telecom’s

network. The seventh component had a variance of 3.298% and consisted of two variables of

charges for sending text messages abroad. The eighth component had a variance of 2.684%

and consisted of one item which is the ratio between originating and terminating call volume.

The following description of the eight components is based on the items in each of them:

Component 1: Usage/charges inside and outside the telecom’s network

Component 2: Usage/charge ratios outside the telecom’s network

Component 3: Usage/charge ratios inside the telecom’s network

Component 4: Usage ratios abroad

Component 5: Text message charge ratios outside the telecom’s network

41

Component 6: Text message charge ratios inside the telecom’s network

Component 7: Charges abroad

Component 8: Ratio of voice volume out/in

5.1.3.3 Parallel analysis

As seen in the previous section, it can be difficult to determine the number of components to

extract, neither the Kaiser’s criterion nor the scree plot would give a clear picture on how

many components to extract. A parallel analysis (PA) was proposed by Horn (1965) as a

technique that generates random variables to determine the number of retained components

and has proven to be dependably precise in determining the threshold for instance for

significant components (Franklin et al, 1995; Ledesma and Valero-Mora, 2007). This is a

Monte Carlo simulation technique and is an enhanced option to other commonly used

techniques such as the Kaiser’s criterion and scree plot previously used.

It is suggested using the eigenvalue that corresponds to a given percentile of the

distribution of eigenvalues that are obtained from the random data (Ledesma and Valero-

Mora, 2007). The 95th

percentile was used in this case and the number of samples generated

was 1000. Eigenvalues for components obtained from the PCA which are greater than their

respective eigenvalues for component from the PA from the random data should be retained.

Those components with eigenvalues below their respective PA eigenvalue threshold most

likely are inaccurate (Franklin et al, 1995). The results that came from the PA are shown in

Table 5-6.

Table 5-6: Comparison of PCA and PA eigenvalues in the post-paid training sample Component PCA eigenvalue PA eigenvalue

1 16.700 1.217

2 7.847 1.195

3 5.203 1.178

4 3.600 1.165

5 3.123 1.153

6 1.736 1.142

7 1.517 1.132

8 1.235 1.122

9 1.056 1.113

10 .915 1.105

.*

.

46

.

.

.001

.

.

.832

*A number of components have been omitted from the table to save space

This table shows that for the first eight components, the eigenvalues are larger from

the principal component analysis but at the ninth component, the eigenvalues from the parallel

42

analysis become larger. These results suggest that an eight component solution would be

appropriate, supporting the previous results obtained from the principal component analysis.

5.1.4 Logistic regression

After dealing with the multicollinearity in the dataset by running a PCA, the next step is the

logistic regression, as there was not an issue with multicollinearity among the components. A

logistic model is fitted to the data to test the research hypotheses previously stated concerning

the relationship between the likelihood that a customer will churn and various features related

to his or her demographics and telephone usage. These variables are related to demographic

information along with the eight components from the PCA. Multicollinearity was also

checked for among the 20 variables omitted from the PCA analysis (see Table II-5 in

Appendix II) and 15 of them did not have this problem. The variables “Maximum text

messages in”, “Maximum text messages out”, “Maximum abroad volume”, “Maximum

abroad frequency” and “Maximum text messages abroad charge” were excluded based on

high VIF/low tolerance values. In all, 38 variables were used in the following analysis.

First, all 38 variables and components were entered into the logistic regression (see

Table II-8 in Appendix II of the variables used). Since the decision to include the independent

variables was based on prior knowledge and research, the method used was Enter, which

forces all variables into the model in one block. After the first run, several of the variables

were insignificant with a p > .05 (see Table II-9 in Appendix II). One of these insignificant

variables was removed at a time, based on the highest p-value and the regression run again.

The final model consisted of 13 statistically significant variables, related to demographics,

usage/charges and components. The variables included in the final model are shown in Table

5-7. The groups in the categorical variables that are significant are marked with a *.

Table 5-7: Results from the logistic regression for the post-paid training sample

B

S.E.

Exp(B)

95% C.I. for Exp(B)

Lower Upper

Constant

Customer age

Family size: 1 person

Family size: 2 people*

Family size: 3 people or more

1.179

-0.028

-1.95

-0.330

0.162

0.003

0.086

0.175

3.253

0.973

0.823

0.719

0.968

0.695

0.510

0.978

0.974

1.013

Land area: Capital area

Land area: Western Iceland

Land area: Northern Iceland

-0.064

-0.086

0.124

0.101

0.938

0.919

0.735

0.754

1.196

1.121

Land area: Eastern Iceland* -1.112 0.195 0.329 0.224 0.482

Land area: Southern Iceland -0.018 0.096 1.019 0.844 1.230

Land area: Unknown -0.009

0.303 0.991 0.547 1.797

43

Tenure 3 years

Average number of services

Average number of products/5

Average abroad total charge ratio

Maximum text messages outnet total charge ratio

Total charge groups: Heavy users

C1: Usage/charges in-/outside network

C2: Usage/charge ratios outside network

C3: Usage/charge ratios inside network

C4: Usage ratios abroad

-0.073

-0.031

0.042

0.705

1.310

0.226

0.205

0.526

0.205

-0.095

0.023

0.026

0.016

0.334

0.421

0.094

0.048

0.038

0.038

0.043

0.930

0.945

1.063

2.024

3.707

1.253

1.228

1.691

1.227

0.909

0.888

0.899

1.030

1.052

1.623

1.043

1.117

1.571

1.139

0.835

0.973

0.994

1.097

3.894

8.468

1.506

1.350

1.821

1.322

0.990

Note: R2 = .129 (McFadden’s ρ

2), .163 (Cox & Snell), .218 (Nagelkerke). Model χ

2 = 718.449, p < .001.

The groups marked with * are statistically significant at the .05 level.

Table 5-7 shows the regression coefficients under the heading B, the standard errors,

the odds ratios under the heading Exp(B) and the 95% confidence interval around the odds

ratio. There are a number of ways to measure the strength of association for a model and they

are an analog to R2 in multiple linear regression. The value for Cox and Snell R

2 was .163.

This measure is based on log-likelihoods and also takes into account sample size. The

Nagelkerke R2 was higher, or .218. McFadden’s ρ

2, which is intended to mimic an R

2, is

another way to test the strength of association. The equation is:

ρ2 =

(5-2)

where,

LL(B) the -2 Log likelihood value for the final model

LL(0) the -2 Log likelihood value for the baseline model including only the constant

The value for the final logistic model was .129 but this measure tends to be much

lower than R2

for multiple regression (Tabachnick and Fidell, 2001). The model had a baseline

model with a -2Log likelihood value of 6070.583 which signifies the fit of the most basic

model, including only the constant, to the data. After entering all nine variables into the model

the -2Log likelihood value became lower, or 5321.468 which indicates that the model

including the independent variables has a better predicting power than when it includes only

the constant. The model was significant with a chi-square = 781.449 (df = 18, p = .001). The

Hosmer and Lemeshow Test is a goodness of fit test which aids to determine whether the

model is correctly specified, that is, how well the model fits the data. If the p-value is below

.05, the model is rejected, indicating a poor model fit. For the final model, the p = .004 which

is significantly lower than the .05 level (chi-square = 22.778, df = 8). The Hosmer and

Lemeshow test has however received some criticism when applied to large datasets as it tends

to give a low p-value although the model fits the data well.

44

5.1.4.1 Results of the logistic regression

The results from the logistic regression imply that in this sample of customers, as the age of a

customer increases by one year, he or she is 0.97 less times likely to churn, with other factors

controlled. This means that the older the customers get, the likelihood that they churn

decreases. For the variable “Family size”, group 2 was significant. This implies that

customers in a family of 2 people are 0.82 times less likely to churn than customers living

alone. For “Land area”, only group 4 was significant, so those customers who live in the

Eastern part of Iceland are 0.33 times less likely to churn to those who live in the base area,

which is the capital area. “Tenure” has virtually no effect as the “B” value is 0.000. Since this

variable shows for how many days a customer has been active, a 1 day increase is a very little

change. To see if it made a difference, a new variable was computed by dividing “Tenure”

with 365 to get a variable based on 1 year. When this variable was entered into the model, the

B value = -0.024 (SE = .008) and customers are 0.977 times less likely to churn as the

variable increases by 1 year. By making a “Tenure” variable based on 3 years, the “B” value =

-0.073 and customers are 0.930 times less likely to churn as the variable increases by three

years. This shows that as a customer stays longer, the less likely he or she is in churning. The

decision was taken to have “Tenure” based on three years in the final model and is shown in

Table 5-7.

The next variable in the table is “Average number of services” over the three months

observation period. As a customer buys one more service, the likelihood that he or she will

churn is 0.945 times less likely. The next variable is “Average number of products”. For every

unit increase in the average number of products that a customer buys, the likelihood of churn

is 1.012 times more likely. However, since the “B” value for this variable is rather small

(0.012) it only adds a negligible amount to the prediction of churn and has the least effect on

churn of all the significant variables. Making this variable based on five products, the “B”

value increased to 0.061 so the effect of this variable is somewhat higher. As the variable

increases by five products, a customer is 1.063 times likelier to churn. Out of the 15 variables

omitted from the PCA analysis, two of them ended in the final model. The former was

“Average abroad total charge ratio”. As this variable increases by 1 unit, customers are 2.02

times more likely to churn. In essence, this means that the more customers pay for usage

abroad, the more likely they are to churn. The latter variable, “Maximum text messages outnet

total charge ratio” has the largest effect on churn with a “B” value of 1.310. There is a 3.7

higher likelihood that a customer churns when the value for this variable increases by 1 unit.

45

Thus, the higher the maximum amount of total charge for sending text messages outside the

telecom’s network, the more likely the customer is to churn. For “Total charge groups”, where

the customers have been divided into two groups, light and heavy users, the odds of a

customer churning are 1.25 times higher for heavy users than light users. Those that use the

mobile phone and other products and services at the telecom more and therefore get larger

billed amounts, are most likely more unsatisfied with paying so much and therefore likelier to

churn. The first of the components that was significant was Component 1: Usage/charges

inside/outside the telecom’s network. When this component increases by 1 unit, customers are

1.2 times more likely to churn. As Component 2: Usage/charge ratios outside the network

increases by 1 unit, customers are 1.7 times likelier to churn. As Component 3: Usage/charge

ratios inside the network increases by 1 unit, customers are 1.23 times more likely to churn.

And finally, for Component 4: Usage ratios abroad, when it increases by 1 unit, a customer is

0.91 times less likely to churn.

The proportion of cases that were correctly classified is shown in Table 5-8 on the

next page. This table documents the validity of predicted probabilities. The rows show the

observed or actual values of the dependent and the columns show the predicted values.

According to this table, with a cutoff point at .5, of the 2189 customers who did not churn, the

model correctly classified 1376 of them as non-churn. The model did better on predicting

those who would churn or 1564 of 2190. This was also supported by the magnitude of

sensitivity (71.4%) in contrast to that of specificity (62.9%). The former measures the

proportion of events (churn) that were correctly classified while the latter measures the

proportion of correctly classified nonevents (non-churn). The false positive rate, which

measures the proportion of cases misclassified as events or churn over all of those classified

as events, is 37.1%. The false negative rate, 28.6%, measures the proportion of cases

misclassified as nonevents or non-churn over all of those classified as nonevents (Peng et al,

2002). The calculations based on equations (3-2) to (3-8) are shown beneath Table 5-8. The

negative predictive value was 68.7% and the positive predictive value was 65.8%.

The overall rate of successful classification was 67.1% which is a moderate

improvement on the 50% correct classification with the model that includes only the constant.

46

Table 5-8: Classification Table for the logistic regression for the post-paid training samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

1376

626

68.7%

813

1564

65.8%

62.9%

71.4%

67.1% aThe cut value is .500

Note: Sensitivity = 1564/(1564+626) = 71.4%. Specificity = 1376/(1376+813) = 62.9%. False positive =

813/(813+1376) = 37.1%. False negative = 626/(626+1564) = 28,6%. Positive prediction = 1564/(1564+813) = 65.8%.

Negative prediction = 1376/(1376+626) = 68.7%. Overall accuracy = (1376+1564)/(1376.626+813+1564) = 67.1%.

The Roc curve

The ROC curve is shown in Figure 5-1. The blue line (the ROC curve) is the predicted

probability based on the results from the logistic regression and green line shows the results

gotten only by chance. The area under the curve (AUC) = .729 with 95% confidence interval

(.714, .744) which is reasonably good and indicates that the fitted model is better than the

base model with only the constant.

--- predicted probability

--- reference line

Figure 5-1: ROC curve for the logistic regression in the post-paid training sample

One can conclude that the logistic regression classifies the group of churn significantly

better than by chance as the area under the curve is significantly different from .5 (p = .001).

5.1.4.2 Linearity of the logit

A linear relationship between the dependent variable and the independent variables is one of

the assumptions in ordinary regression (Field, 2009; Tabachnick and Fidell, 2001). However,

since the dependent variable in logistic regression has only two categories it is necessary to

use the log (or logit) of the data. For the assumption of linearity in logistic regression to hold,

there has to be a linear relationship between the continuous independent variables and the

logit of the outcome variable. To test this, interactions between each continuous variable and

the log of itself were included in the logistic regression along with all independent variables.

47

All the interactions terms were insignificant since their significant values were greater than

.05. This implies that the assumption of linearity of the logit has been met for the continuous

variables in the data (Field, 2009; Tabachnick and Fidell, 2001).

5.1.4.3 Validation

To validate the results of the logistic regression, the logistic model was tested on another

sample from Telecom X. The main difference with this sample is that it is not balanced like

the training sample. There are 28737 cases in this testing sample, 27910 active customers and

only 827 churners or 2.9%. This reflects the real life situation where the percentage of churn

is very low on a monthly basis. 46% of the sample are women, 96.6% pay the bill themselves

and 50% are light users. Similar to the training sample, most of the customers come from the

greater capital area or 55.2%, 14.5% come from the northern part of Iceland and the same

goes for southern Iceland. 8.8% come from the western part and 5.4% come from the eastern

part. This is very similar to the distribution in the training sample. 1.7% had an unknown

residency. The size of family was somewhat different as there were up to 11 people in a

family in the testing sample but up to 7 in the training sample. As with the training sample,

there are much fewer cases in each of the categories from 4 people to 11 people so a new

variable was made with 3 categories, 1 person, 2 people and 3 people or more, to make the

categories more similar in size. Finally, of the demographic variables, marital status was also

quite comparable with the training set. The category with married people was largest, or

54.6%. 25.9% were unmarried, 10.6% divorced, 6.1% widowed and 1.2% separated. Finally,

0.8% had another marital status than specified above. 207 or 0.7% had an unknown marital

status. The mean age was slightly higher in the testing sample or 52.26 years (SD = 15.03).

The youngest customers were 18 and the oldest was 101 years old. The mean for tenure was

1641.12 days (SD = 1563.95. The minimum number of days was 31 and the highest number

was 4544 or over 12 years. The average number of products was 27 compared to 29.8

products in the training sample. The mean for the average amount paid for mobile phone

usage over the three months in the testing sample is lower than in the training sample, or

8076.8 ISK and 9103.1 ISK respectively.

To test the resulting logistic regression model produced with the training sample,

scores are calculated in the testing data set based on the model. SPSS makes it possible to

export the model information to a new data set which is used for testing the model and

produces scores for each case in the new set. Those cases with a score > .5 were classified as

48

churn, those with a score <= .5 were classified as non-churn. The predicted status was then

compared with the actual status for each case. The results of the overall classification are

shown in Table 5-9.

Table 5-9: Classification table for the logistic regression for the post-paid testing samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

14739

137

99.1%

13171

690

5.0%

52.8%

83.4%

53.7% aThe cut value is .500

Note: Sensitivity = 690/(690+137) = 83.4%. Specificity = 14739/(14739+13171) = 52.8%. False positive =

13171/(13171+14,739) = 47.2%. False negative = 137/(137+690) = 16.6%. Positive prediction = 690/(690+13171) =

5.0%. Negative prediction = 14739/(14739+137) = 99.1%. Overall accuracy = (14739+690)/(14739+137+13171+690)

= 53.7%.

The model correctly predicted the status for churn in 690 cases or 83.4% of the 827

customers that actually churned. It did poorer at predicting for non-churn, as it had a

specificity of 52.8%. The false positive rate or Type I error was 47.2% as it predicted 13171

cases as churners which were actually still active. Finally, the false negative or Type II error

was relatively low, or 16.6% as it predicted 137 as non-churn who had actually churned. In

this case, it is better to have a lower Type II error since it is worse to misclassify someone

who churned as non-churn than to misclassify someone who is still active as churn. As a

result of the high number of false positives, the overall accuracy was 53.7% which is lower

than with the training sample of 67.1%. The model had a very low positive predictive value of

5.0% as it incorrectly predicted so many non-churn cases as churn. The negative predictive

value was particularly high or 99.1%.

5.1.5 Decision Tree

This procedure generates a model that is a tree-based classification model. What this model

does is to categorize cases into groups or it predicts values of a dependent variable based on

the independent variables’ values.

With large datasets, the speed of classification can decrease and the average depth of a

decision tree can get deeper. This means that the tree’s structure grows to be large and

complicated. By using the results from the PCA previously done, noise data can be filtered

and the dimensions of the data set are reduced (Hu et al., 2009). This was supported by a

research by Piramuthu (2008) where the results showed that when multicollinearity was

present in the data, reducing either the dimensionality or the size of the sample with factor

49

analysis or cluster analysis would improve the performance of the decision tree, both reducing

the size of the tree and decreasing the predicting error.

5.1.5.1 Results of the Decision Tree

As described in section 3.4.2, there are four growing methods that can be used. They were all

used in turn with the intention of comparing the results to see which is the best method. The

same 38 variables were used as in the logistic regression done previously.

The methods were used in the same order as they appear in section 3.4.2. CHAID

generated a decision tree with 35 nodes, 22 terminal nodes and a depth of 3, which specifies

the number of levels below the root node. Exhaustive CHAID generated a decision tree with

28 nodes, 18 terminal nodes and a depth of 3. CRT produced a tree with 19 nodes, 10 terminal

nodes and a depth of 5. Lastly, QUEST produced a tree with 31 nodes, 16 terminal nodes and

also a depth of 5. CHAID used nine of the 38 variables in the analysis to create a tree and

exhaustive CHAID included seven of the 38 variables used in the analysis, QUEST used 26

variables and CRT used all the variables except two, “Is payer” and “Gender”. Those

variables that did not make a significant contribution were dropped from the model. For the

methods CHAID, exhaustive CHAID and CRT, the variable “Average text messages in” was

the best predictor for customers’ status. With the QUEST method, customer age was the best

predictor.

CRT was the method that had the highest overall classification value of 67.6%.

Exhaustive CHAID had the second highest value of 67.1%, CHAID had 66.80% and QUEST

had 65.8%. However, exhaustive CHAID had the highest sensitivity of 77.8% which means

that it predicted churn correct in almost 78% of the cases and this is what the analysis is

focusing on. The second highest was QUEST with a sensitivity of 76.7%. CRT had 75.8%

and CHAID had 73.7%. CHAID had the highest specificity of 59.9% so it did best of the four

methods in predicting for non-churn. CRT had 59.5%, exhaustive CHAID 56.4% and QUEST

had 54.9%. Table 5-10 on the next page shows the risk estimate of the four different growing

methods and as can be seen, CRT had the lowest risk estimate of .324. It indicates that the

category that was predicted by the model (churn or non-churn) was wrong in 32.4% of the

cases. This means that the “risk” of misclassifying a customer is around 32%. QUEST had the

highest risk estimate of .342. All four methods had a standard error of .007.

50

Table 5-10: Risk estimates of different growing methods for the post-paid training sample Method Estimate Standard error

CHAID .332 .007

ExCHAID .329 .007

CRT .324 .007

QUEST .342 .007

Based on these results, the method chosen for the final decision tree was exhaustive

CHAID. It had the highest sensitivity and the overall correct percentage was not much lower

than that of CTR. Table II-10 in Appendix II shows the tree table which shows most of the

fundamental information from the tree diagram in the form of a table. It shows the predictors

that were used to predict churn and the split values for each predictor. It also shows whether

churn or non-churn was predicted in each node and how many cases are of each group in the

node. Table 5-11 shows the classification results of churn and censoring from the training

sample with the exhaustive CHAID method. For those customers with the status churn, the

decision tree predicted churn accurately for 77.8% of them.

Table 5-11: Classification table for unpruned decision tree in the post-paid training samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

1235

486

71.6%

954

1704

64.1%

56.4%

77.8%

67.1% aGrowing method: Exhaustive CHAID

Note: Sensitivity = 1704/(1704+486) = 77.8%. Specificity = 1235/(1235+954) = 56.4%. False positive =

954/(954+1235) = 43.6%. False negative = 486/(486+1704) = 22.2%. Positive prediction = 1704/(1704+954) = 64.1%.

Negative prediction = 1235/(1235+486) = 71.6%. Overall accuracy = (1235+1704)/(1235+486+954+1704) = 67.1%.

As the decision tree produced with the exhaustive CHAID method is rather large, the

CRT method can be used to avoid overfitting the model by pruning the tree. The results of the

pruned tree are presented in Table 5-12. There was a negligible reduction in the overall

accuracy with the pruned tree as it went from 67.1% to 67.0%. There was a larger difference

in the sensitivity as it reduced from 77.8% to 74.2%. It did do better with predicting non-

churn as the specificity increased from 56.4% to 59.8%.

Table 5-12: Classification table for pruned decision tree in the post-paid training samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

1309

566

30.2%

880

1624

64.9%

59.8%

74.2%

67.0% aGrowing method: CRT

Note: Sensitivity = 1624/(1624+566) = 74.2%. Specificity = 1309/(1309+880) = 59.8%. False positive =

880/(880+1309) = 40.2%. False negative = 566/(566+1624) = 25.8%. Positive prediction = 1624/(1624+880) = 64.9%.

Negative prediction = 1390/(1309+566) = 30.2%. Overall accuracy = (1309+1624)/(1309+566+880+1624) = 67.0%

51

With the pruned tree, there was a minute increase in the risk estimate from .329 to

.330. This pruned model also predicted fewer churned customers correctly, or 1624, however

it predicted more non-churners correctly, or 1309. The difference can also be seen in the

model summary since the tree goes from having 19 nodes and 10 terminal nodes to having 9

nodes and 5 terminal nodes. The depth of the tree went from 5 to 4. The pruned tree table is

shown in Table II-11 and the tree diagram in Figure II-2 in Appendix II. “Average number of

text messages sent” is the best predictor in the decision tree. Also in the tree are “Tenure”,

“Customer age” and “Land area” which were significant in the logistic regression as well.

74.9% of the customers who sent less than 7.17 text messages over the three month period

were still active while 60.5% of the customers who send more than 7.17 text messages on

average over the three month period have churned. The next best predictor for those who sent

more than 7.17 messages was “Tenure” and for those who have stayed less than 1359.5 days,

68.5% of them had churned. Slightly more than half (53.5%) of those who have stayed longer,

churned. This shows that customers who have stayed longer are less likely to churn, which

concurs with the results from the logistic regression. For customers who have stayed longer

than 1359.5 days, 56.8% of those who are older than 53.5 years old are more likely to stay. Of

those who are younger than 53.5 years old, 58.3% of them have churned. So younger

customers are more likely to churn which is the same effect as in the logistic regression.

Finally, of those in the node with customers younger than 53.5 years old, 24.5% of them who

live in Eastern Iceland have churned. Of those in the other categories for “Land area”, 60%

have churned which was also the same as in the logistic regression. Over all, the same effects

apply for the variables that are significant in both classification methods.

Gain is calculated for each terminal node to show the node’s performance. Gain is the

percentage of total cases predicted for in the target category (in this case churn) in a terminal

node of the total cases of the target category in the whole sample. In this decision tree, node 2

has the highest gain of 85.2%. A gains chart plots the gain percent for the whole tree. The

gains chart is showed in Figure II-3 in Appendix II. It implies that the model is a moderately

good one. These cumulative charts always start at 0% and end at 100% as one goes from one

end to the other. For a good model, the line will rise steeply toward 100% and then level off.

A model that follows the diagonal reference line gives no information.

52

The ROC curve

The ROC curve for the decision tree is shown in Figure 5-2. AUC for the decision tree = .697

with 95% confidence interval (.681, .712) and a p = .001 so it is a statistical significant ROC

curve. AUC of .697 indicates that it is a fair test. The AUC is lower for the decision tree than

for the logistic regression, which was .729.

--- predicted probability --- reference line

Figure 5-2: ROC curve for the decision tree in the post-paid training sample

5.1.5.2 Validation

This procedure evaluates how well the tree structure generalizes to a larger population. To test

the decision tree model generated using the training sample, the decision tree diagram is

followed down and the cases in the testing sample labeled churn or non-churn according to

the probability of churn or non-churn in each terminal node. SPSS makes a score for each

case based on the decision tree model and those with a score > .5 are labeled as churn, those

with a score <= .5 are labeled as non-churn. This sample was described in section 5.1.4.3. The

results are shown in Table 5-13. As the table shows, the decision tree model predicted the

status correctly for 630 churners out of the 827, or 76.2% which is the sensitivity.

Table 5-13: Classification table for decision tree in the post-paid testing samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

16714

197

98.8%

11196

630

5.3%

59.9%

76.2%

60.4% aGrowing method: CRT

Note: Sensitivity = 630/(630+197) = 76.2%. Specificity = 16714/(16714+11196) = 59.9%. False positive =

11196/(11196+16714) = 40.1%. False negative = 197/(197+630) = 23.8%. Positive prediction = 630/(630+11196) =

5.3%. Negative prediction = 16714/(16714+197) = 98.8%. Overall accuracy =

(16714+630)/(16714+197+11196+630) = 60.4%.

53

The specificity was 59.9% meaning that the model predicts correctly for 60% of those

customers who did not churn. The false positive rate was high or 40.1% as the model

predicted many cases as churners which were in fact still active. This is the same as with the

logistic regression. The false negative rate was 23.8%. The overall accuracy was 60.4% which

is higher than with the logistic regression which had 53.7%. The positive predictive value was

extremely low at 5.0% as the model predicted so many cases incorrectly as churn. However,

the negative predictive value was 98.8%.

Based on the results from the validation of both the logistic regression and the decision

tree, it is difficult to see which method should be recommended for churn prediction. One

could make the deduction that the logistic regression would be more suitable of these two

classification methods. The overall accuracy is relatively lower for the testing sample with the

logistic regression but it had a higher sensitivity, therefore predicting churn more accurately

which is of main interest here. Otherwise, there is perhaps not a straightforward way to

compare these two classification methods as they differ in many ways. It seems that the

logistic regression is somewhat better at predicting churn and with this method, one can see

the size of the effect each significant variable has on churn and whether there is a positive or

negative relationship between the variables and churn. The decision tree on the other hand, is

convenient as the tree diagram can show how one choice leads to another as one follows

down the tree. The tree illustrates the relationship between different attributes (like gender,

tenure, usage and billing) and possible end results (churn or non-churn). Both methods

therefore have their own specific characteristics.

54

5.2 Pre-paid customers

5.2.1 Sample description

The sample of pre-paid customers at the telecom is analyzed in this chapter. Pre-paid

customers are those who purchase credit in advance of service use. The training sample for

the pre-paid customers is a bit larger than the sample for the post-paid customers as it consists

of 5995 cases. The reason is that there are more churners among pre-paid customers during

the five months performance period than among the post-paid customers. This sample is

balanced like the post-paid sample, with 2922 (48.74%) churners and 3073 non-churners

(51.26%).

The mean age for the sample of 5995 cases that were used, was 37.32 years (SD =

18.33). The maximum age was 99 years and the minimum age was 6 years. For the marital

status of the customers in the training sample, most of them are unmarried or 53.2%. 32.3%

are married or in a registered partnership. Like in the sample with post-paid customers, the

two categories with the fewest number of cases, “Married (not living together)” and

“Icelander living abroad” were combined into one category named “Other”. This category has

53 cases or 0.9%. All other categories were left the same (see Table 5-14).

Table 5-14: Marital status of customers in the pre-paid training sample

Marital status Frequency Percent Cumulative %

Unmarried 3189 53.2 53.2

Married/registered partnership 1934 32.3 85.5

Divorced 449 7.5 93.0

Widowed 172 2.9 95.9

Separated

Other

Marital status unknown

72

53

126

1.2

0.9

2.1

97.1

98.0

100.0

Total 5995 100.0

Regarding the customers’ family size, most customers in the sample were single

individuals or 4251 (70.9%). The second largest category was the family size of 2 people or

1226 (20.5%) cases. 1 customer was in a family of 13. Categories with family sizes larger

than 2 people included fairly few cases each, so they too were combined into 1 category,

resulting in a new variable with 3 categories, 1 person, 2 people and 3 people or more with

518 cases (8.6%) (see Table 5-15 on the next page).

55

Table 5-15: Family size of customers in the pre-paid training sample

Family size Frequency Percent Cumulative %

1 person 4251 70.9 70.9

2 people 1226 20.5 91.4

3 people or more 518 8.6 100.0

Total 5995 100.0

The customers were again fairly dispersed over the country, with the majority of the

customers living within the greater capital area, or 2861 (47.7%), followed by 1037 (17.3%)

customers who live in the Northern part of Iceland (see Table 5-16).

Table 5-16: Residence of customers in the pre-paid training sample

Land area Frequency Percent Cumulative %

Capital area 2861 47.7 47.7

Western Iceland 546 9.1 56.8

Northern Iceland 1037 17.3 74.1

Eastern Iceland 307 5.1 79.2

Southern Iceland

Unknown

953

291

15.9

4.9

95.1

100.0

Total 5995 100.0

291 (4.9%) customers have an unknown location. The mean for tenure was 1460.36

days or 4 years (SD = 1334.93). This is considerably lower than the mean for the post-paid

sample which was 2177 days. The maximum number of days was 5815 (approximately 16

years) and the minimum number of days was 32. In this sample there are not variables for the

average number or products or services bought, but instead the average refill frequency and

average refill amount. The average for the former, “Average refill frequency” was 2.16 (SD =

2.77). The maximum number of average refills bought was 31.67 and some customers did not

buy a refill during the three months observation period. There were 3109 (51.9%) females in

the sample and 2886 males. Table 5-17 shows that out of the 5995 individuals that gave up

demographic information, 1513 females (48.7% of total females) and 1409 males (48.8% of

total males) churned during the five months performance period.

Table 5-17: Crosstable of Status*Gender for the pre-paid training sample Gender

Total Female Male

Status Non-churn Count 1596 1477 3073

% within gender 51.3% 51.2% 51.3%

Churn Count 1513 1409 2922

% within gender 48.7% 48.8% 48.7%

Total Count 3109 2886 5995

% within gender 100.0% 100.0% 100.0%

The Continuity Correction for the Chi-square test = .009 (df = 1) and a significance

value was .924 (see Table II-12 in Appendix II) which is larger than the alpha value of .05.

Consequently the results are not significant, meaning that the proportion of females that churn

56

is not significantly different from the proportion of males that churn. As with the post-paid

sample, the same test was done for family size, land area, marital status and total charge

groups. For “Family size” the Pearson Chi-square = 3.254 (df = 2) with a p = .197, thus there

is no difference in status and number of people in the family. For “Land area” the chi-square

= 132.934 (df = 5) and p = .001 so there was a difference in status and where people live. The

Cramer’s V = .149 suggesting a somewhat strong relationship. The Chi-square for “Marital

status” = 209.997 (df = 6) and a p = .001. So there is a difference between customer status and

family status and the Cramer’s V = .187 (p = .001) indicating a relatively strong relationship.

Finally, for the “Total charge groups” the Chi-square = 179.623 (df = 1) and p = .001. This

relationship is also significantly different and the Cramer’s V = .173 (p = .001). Since the

variable “Is payer” only has one category left (of two) after filtering out customers with no

demographic information, it was omitted from the analysis.

The next step was to discover if there was a significant difference in the mean of the

continuous variables and the customer status by doing an independent samples t-test. Out of

the 61 variables related to demographics, averages and maximum values, 48 were significant

with a p < .05 in the Levene’s test (see Table II-13 in Appendix II). For these variables, the

variance of scores for the two groups in customer status is not the same. The insignificant

variables in the Levene’s test were “Average innet frequency”, “Average total out volume”,

“Average abroad volume ratio”, “Average abroad frequency ratio”, “Average voice outin

volume ratio”, “Average innet charge”, “Average abroad total charge ratio”, “Maximum voice

outin volume ratio”, “Maximum innet frequency”, “Maximum total out volume”, “Maximum

outnet volume ratio”, “Maximum outnet frequency ratio” and “Maximum innet charge”. For

the t-test for equality of means, 19 variables were insignificant with a p > .05 showing that

there is not a significant difference in the mean values for the two groups for customer status.

They were “Maximum refill amount”, “Average innet volume”, “Average abroad volume”,

“Average innet volume ratio”, “Average innet frequency ratio”, “Average abroad volume

ratio”, “Average abroad frequency ratio”, “Average voice outin volume ratio”, “Average

abroad charge”, “Average abroad total charge ratio”, “Maximum voice outin volume ratio”,

“Maximum innet volume”, “Maximum abroad volume”, “Maximum innet volume ratio”,

“Maximum innet frequency ratio”, “Maximum abroad volume ratio”, “Maximum abroad

frequency ratio”, “Maximum abroad charge” and “Maximum abroad total charge ratio”.

Then to measure the effect size statistics, Eta squared was calculated (see equation 5-

1) and the values ranged from 7.58*10-6

to .0818 which is similar to the post-paid sample.

57

“Average outnet total charge ratio” had the highest eta and therefore 8.18% of the variance in

this variable is explained by customer status.

5.2.2 Multicollinearity

To check whether multicollinearity is an issue in the data, the tolerance and VIF (Variance

Inflation Factor) statistics were looked at. A total of 58 variables related to usage and charge

were entered into linear regression and all but 5 had a tolerance value < .1 and VIF > 10

revealing that there is an issue with multicollinearity in the data. The decision was taken to

run a principal component analysis on all 58 usage and charge variables.

5.2.3 Principal component analysis

To see if the 53 variables which were involved in the multicollinearity form coherent subsets,

a Principal component analysis (PCA) was performed.

The 58 items used in the PCA were related to usage and charges, both inside and

outside the telecom’s network as well as abroad (see Table II-14 in Appendix II). The values

for the KMO and Bartlett’s test with all items included were .796 and 779787.833

respectively and a significance of .001 (df = 1653) which confirmed the sampling adequacy

for the analysis. Only “Maximum refill amount” had an individual KMO value < .5 and was

removed. By doing so, the communality for “Average refill amount” became < .5 and was

also removed. After the removal of these two items, all other items in the analysis had an

individual KMO and communality > .5 which implied that they were all applicable for the

PCA. The number of eigenvalues > 1 suggested that 12 components would be an appropriate

solution and the scree plot showed a point of inflexion at 2, 4, 7, 8 and 17 implying a solution

with 1, 3, 6, 7 or 16 components (see Figure II-4 in Appendix II). The components were next

rotated with the Varimax rotation method. After removing the two items mentioned above, a

solution with 12 components was not appropriate anymore since one component only

consisted of very low loadings. A solution with 11 components was tried but still one

component had only low loadings so ten components were extracted. After omitting total of

six items based on low KMO values or low loadings (see Table II-15 in Appendix II), a

solution of ten components was the result.

58

5.2.3.1 Internal consistency reliability analysis

Table 5-18 shows the results of the consistency reliability analysis for the ten components.

The Cronbach’s alpha was > .7 for all components except number eight, but would increase to

.731 by taking out “Maximum total in frequency”. Therefore all ten components were used in

the PCA analysis.

Table 5-18: Cronbach’s alpha for the components for the pre-paid training sample Component N of items Cronbach’s

alpha

1 8 .873

2 6 .867

3 6 .954

4 8 .872

5 6 .945

6 6 .942

7 4 .969

8 4 .653

9 2 .730

10 2 .887

5.2.3.2 Results from the PCA

After deleting one more variable based on Section 5.2.3.1, the PCA was rerun. Two more

variables were omitted based on low loadings, “Average total in freq” and “Average outnet

frequency”. By doing this, one component had no loadings so nine components were

extracted. The KMO and Bartlett’s test had values of .764 and 624287.937 (df = 1176 and p =

.001) respectively. There were 7% (90) of the residuals with absolute values > .05. All

communalities were > .5 and the total variance explained by the nine components was

86.382% (see Table II-16 in Appendix II) and as can be seen, nine components have

eigenvalues > 1 so this supports a solution with nine components. The resulting components

are shown in Table II-17 in Appendix II which is the rotated component matrix. Only

loadings above .3 are shown it the table to simplify it. The emerged component structure was

quite clear and easy to interpret. The first component contained nine items and accounted for

26.695% of the variance. It represents usage inside and outside the telecom’s network and

total in and out usage. The second component had a variance of 18.172% and had six items. It

represents usage and charges abroad. The third component had a variance of 11.726% and

consisted of six items which are ratios related to usage and charges outside the network. The

fourth component had a variance of 7.588% and also consisted of six items related to ratios

regarding usage and charges abroad. The fifth component had a variance of 6.490% and

59

consisted of eight items related to charges inside and outside the network along with

frequency of refills. The sixth component had a variance of 5.327% and consisted of six items

related to rations with usage and charge inside the network. The seventh component had a

variance of 3.890% and consisted of four items related to sent and received text messages.

The eighth component had a variance of 3.434% and consists of two items, average and

maximum voice outin volume ratio. The ninth and last component had a variance of 3.061%

and had two items, average and maximum ratios for sent text messages related to text

messages received. The following description of the nine components is based on the items in

each of them:

Component 1: Usage inside and outside the telecom’s network

Component 2: Usage/charges abroad

Component 3: Usage/charge ratios outside telecom’s network

Component 4: Usage/charge ratios abroad

Component 5: Refills and charges inside and outside the telecom’s network

Component 6: Usage/charge ratios inside the telecom’s network

Component 7: Text messages sent and received

Component 8: Ratio of voice volume out/in

Component 9: Ratio of sent and received text messages

5.2.3.3 Parallel analysis

Parallel analysis (PA) was used again as a technique to generate random variables to

determine the number of retained components to compare with the PCA results. The results

from the PA are shown in Table 5-19. This table shows that for the first nine components, the

eigenvalues are larger from the principal component analysis (PCA eigenvalue) but at the

tenth component, the eigenvalues from the parallel analysis (PA eigenvalue) become larger.

These results suggest that a nine component solution would be appropriate supporting the

previous results from the PCA.

Table 5-19: Comparison of PCA and PA eigenvalues for the pre-paid training sample

Component PCA eigenvalue PA eigenvalue

1 13.080 1.191

2 8.904 1.171

3 5.746 1.158

4 3.718 1.146

5 3.180 1.137

6 2.610 1.128

7 1.906 1.120

8 1.683 1.116

9 1.500 1.104

10 .966 1.097

*

.

49

.

.

.001

.

.

.850

*A number of components have been omitted from the table to save space

60

5.2.4 Logistic Regression

The same procedure was followed with the logistic regression for the pre-paid sample as for

the post-paid sample. All the demographic variables and the nine components were used

along with “Average refill amount”, “Maximum outnet volume”, “Maximum outnet

frequency”, “Average total out frequency”, “Average total in frequency” and “Average outnet

frequency” since there was no problem with multicollinearity among them.

First, all 22 variables and components were entered into the logistic regression (see

Table II-18 in Appendix II). The results with all the variables are showed in Table II-19 in

Appendix II. Several of the variables were insignificant with a p > .05 and were thus removed

from the analysis, based on the highest p-value. The final model had 14 significant variables

at the .05 level (see Table 5-20). “Tenure”, “Maximum outnet volume” and “Average total in

frequency” had very low “B” values and therefore virtually no effect on churn, hence new

variables based on three years were created and used in the model. The groups in the

categorical variables that were significant are marked with a *.

Table 5-20: Results from the logistic regression in the pre-paid training sample

B

S.E.

Exp(B)

95% C.I. for Exp(B)

Lower Upper

Constant

Customer age

Marital status: Unmarried

Marital status: Married

Marital status: Widowed*

Marital status: Separated

Marital status: Divorced

Marital status: Other*

Marital status: Unknown

1.118

-.028

.089

-.534

.284

.053

.800

-.068

.118

.003

.092

.257

.292

.140

.359

.240

3.060

.972

1.093

.586

1.328

1.054

2.227

.934

.967

.912

.354

.750

.802

1.102

.584

.977

1.310

.971

2.351

1.386

4.499

1.495

Land area: Capital area

Land area: Western Iceland

Land area: Northern Iceland

-.161

-.177

.117

.092

.851

.838

.676

.700

1.071

1.002

Land area: Eastern Iceland* -.320 .159 .726 .532 .991

Land area: Southern Iceland .039 .092 1.039 .868 1.244

Land area: Unknown*

Tenure 3 years

Maximum outnet volume 3 years

Maximum outnet frequency

Average total in frequency 3 years

Average outnet frequency

C1: Usage in-/outside network

C3: Usage/charge ratios outside network

C4: Usage/charge ratios abroad

C5: Refills and charges in-/outside network

C6: Usage/charge ratios inside network

C7: Text messages sent/received

-.830

-.164

-.031

.042

7.830

-.046

-.193

.528

.090

.100

.112

.113

.174

.028

.014

.004

1.167

.004

.059

.044

.033

.051

.035

.042

.436

.849

.970

1.043

2514.817

.995

.824

1.696

1.094

1.105

1.119

1.119

.310

.804

.944

1.035

255.582

.947

.735

1.556

1.025

1.001

1.044

1.031

.613

.896

.996

1.051

24744.720

.963

.925

1.849

1.168

1.220

1.199

1.215

Note: R2 = .300 (McFadden’s ρ

2), .340 (Cox & Snell), .453 (Nagelkerke). Model χ

2 = 2490.976, p < .00.

The groups marked with * are statistically significant at the .05 level.

61

The value for Cox & Snell was .340 and for Nagelkerke it was .453. McFadden’s ρ2

was .3. The baseline model, including only the constant, had a -2Log likelihood of 8307.031.

The model was significant with a chi-square of 2490.976 (df = 23, p = .001) and the -2Log-

likelihood had decreased to 5816.054 which shows that including the variables improve the

model. The Hosmer and Lemeshow test was insignificant with a p = .435 indicating a good

model fit.

5.1.4.1 Results of logistic regression

The results from the logistic regression show that for the first variable “Customer age”, as the

age of a customer increases by one year, he or she is .97 less times likely to churn, with other

factors controlled. Customer age had the least effect on churn. This means that the older the

customers get, the likelihood that they churn decreases. For the variable “Marital status”,

groups 3 and 6 were significant. This implies that customers who are widowed are .59 times

less likely to churn than customers living alone but those with the marital status “Other” are

2.23 times more likely to churn than those living alone. For “Land area”, groups 4 and 6 were

significant, so those customers who live in the Eastern part of Iceland or have an unknown

address are .73 and .44 times (respectively) less likely to churn to those who live in the base

area, which is the capital area. The same procedure was done with the variable “Tenure” here

as in the post-paid sample, a new variable was computed based on 3 years to get a better

indication of the effect of this variable on churn. Customers are .85 times less likely to churn

as tenure increases by 3 years. This shows that as a customer stays longer, the less likely he or

she is in churning.

The next variable in the table is “Maximum outnet volume”, based on three years. As

this variable increases by one unit, the likelihood that the customer will churn is .970 times

less. The next variable is “Maximum outnet frequency”. For every 1 unit increase in this

variable, the likelihood of churn is 1.043 times more likely. As “Average total in frequency”,

also based on three years because of low “B” value, increases by one unit, a customer is

2514.82 times likelier to churn. This variable had by far the largest effect on churn with a “B”

value of 7.830. As “Average outnet frequency” increases by one unit, a customer is .955 times

less likely to churn. The first of the components that was significant was Component 1:

Usage in-/outside the telecom’s network. This is the only component which had a negative

effect on churn, when it increases by one unit, customers are .824 times less likely to churn.

As Component 3: Usage/charge ratios outside the network increases by one unit, customers

62

are 1.696 times likelier to churn. As Component 4: Usage/charge ratios abroad increases by

one unit, customers are 1.094 times more likely to churn. For Component 5: Refills and

charges in-/outside the network, when it increases by one unit, a customer is 1.105 times more

likely to churn. As Component 6: Usage/charge ratios inside the network increases by one

unit, customers are 1.119 times more likely to churn. And finally, as Component 7: Text

messages sent/received increases by one unit, customers are 1.119 times more likely to churn.

The proportion of cases that were correctly classified is shown in the classification

table in Table 5-21. According to this table, with a cutoff point at .5, of the 3073 customers

who did not churn, the model correctly classified 2223 of them as not likely to churn. The

model did better on predicting those who would churn. Of the 2922 customers who actually

churned, 2328 were correctly classified as being likely to churn. This was also supported by

the magnitude of sensitivity (79.7%) meaning that it predicted correctly in almost 80% of

those who churned. The specificity was 72.3% so the model did not predict as well for those

who did not churn. The false positive rate is 26.8%. The false negative rate was 21.1%. The

overall rate of successful classification was 75.9% which is a good improvement on the

51.3% correct classification with the model that includes only the constant. This model

predicted better overall than the model for the post-paid sample which had a correct overall

classification rate of 67.1%.

Table 5-21: Classification Table for the logistic regression for the pre-paid training samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

2223

594

73.3%

850

2328

78.9%

72.3%

79.7%

75.9% aThe cut value is .500

Note: Sensitivity = 2328/2328+594 = 79.7%. Specificity = 2223/2223+850 = 72.3%. False positive = 850/850+2328 =

26.8%. False negative = 594/594+2223 = 21.1%. Positive prediction = 2328/(2328+850) = 73.3%. Negative prediction

= 2223/(2223+594) = 78.9%. Overall accuracy = (2223+2328)/(2223+594+850+2328) = 75.9%.

The Roc curve

The ROC curve is shown in Figure 5-3 on the next page. The area under the curve for the

fitted model applied to the training data set = .838 with a 95% confidence interval (.828, .847)

and a p = .001. Logistic regression seems to predict better for the pre-paid sample than for the

post-paid sample as the AUC is higher.

63

--- predicted probability

--- reference line

Figure 5-3: ROC curve for the logistic regression for the pre-paid training sample

5.1.4.2 Linearity of the logit

All interactions between each continuous variable and the log of itself were included in the

logistic regression along with all independent variables. All the interactions terms were

insignificant since their significant values were greater than .05. This implies that the

assumption of linearity of the logit has been met for the continuous variables in the data.

5.1.4.3 Validation

The next step was to validate the logistic model by applying it to the testing sample. This

sample was highly skewed with regard to customer status like the testing sample for the post-

paid customers. It had a total of 15906 customers of which 1016 (6.4%) had churned. 8600

(54.1%) of the cases were men. 10684 (67.2%) were single, 3547 (22.3%) were in a family of

two and 1675 (10.5%) were in a family of 3 or more people. Just under half of the cases

(47.4%) lived in the capital area, 16.6% lived in the Northern part of Iceland, 14.6% in the

Southern part, 8.5% in the Western part and 6.0% in the Eastern part. 6.9% had an unknown

residence. Most of the customers in this sample were unmarried (48.8%), 35.0% were

married, 9.6% were divorced or separated, 3.4% were widowed and 1.2% had another marital

status. 2.1% had an unknown marital status. The mean age was 40.5 years (SD = 19.03). The

lowest age was 2 years and the highest age was 98. The mean for tenure was 1327.4 days

which is about 3.6 years (SD = 1185.2). The shortest tenure was 31 days and the longest

tenure was 4541 days or 12.4 years. The average frequency of refill over the three months

observation period was 1.7 times (SD = 2.59). Some customers never refilled during the 3

month observation period but the highest refill frequency was 31.33 times. The average refill

64

amount was ISK 1805.62 (SD = 23010.1). The lowest amount was ISK 0.00 since some

customers never refilled and the highest amount was ISK 2508149.8.

To test the logistic model that was created using the training sample and see how well

it predicts for new cases, SPSS created scores in the testing dataset based on the logistic

regression model. Cases with a score (predicted probability) > .5 were labeled as churn, others

as non-churn. The results of the overall classification are shown in Table 5-22.

Table 5-22: Classification table for the logistic regression for the pre-paid testing samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

7689

179

97.7%

7201

837

10.4%

51.6%

82.3%

53.6% a. The cut value is .500

Note: Sensitivity = 837/(837+179) = 82.3%. Specificity = 7689/(7689+7201) = 51.6%. False positive =

7201/(7201+837) = 89.6%. False negative = 179/(179+7689) = 2.3%. Positive prediction = 837/(837+7201) = 10.4%.

Negative prediction = 7689/(7689+179) = 97.7%. Overall accuracy = (7689+837)/(7689+179+7201+837) = 53.6%.

The sensitivity was 82.3% which is higher than with the training sample which had

79.7%. The specificity was 51.6% which is quite lower than with the training sample which

had 72.3% so the model only predicts correctly for just over 51% of the cases for non-churn.

The false positive rate was very high or 89.6% which reflects the fact that the model predicts

so many cases as churn which are in fact non-churn. The false negative rate was on the

opposite very low, or 2.3%. The overall accuracy of the model is 53.6% which is much lower

than the 75.9% for the training sample and not much better than a model based only on

chance. In conclusion, this shows that the model does well when predicting for churn which is

the purpose of the model in this analysis but works rather poorly when predicting for non-

churn and has a high false positive rate.

5.1.5 Decision Tree

In this section, a tree-based classification model is produced the same way as the decision tree

model for the post-paid sample was made. The results are presented in the following section.

5.1.5.1 Results of Decision Tree

The same 22 variables were used to create the decision tree as in the logistic regression

analysis. The CHAID method produced a tree with 52 nodes, 34 terminal nodes and a depth

65

of 3. Ten variables were included in the model. The Exhaustive CHAID method produced a

tree with 51 nodes, 32 terminal nodes and a depth of 3 like CHAID. This method used 11

variables in the model. The CRT method produced a tree with much fewer nodes, or 25 and

13 terminal nodes. The depth was 5 (7, 4 and 3 respectively with pruning). It used all

variables for the model except “Gender”. The last method used in this analysis, QUEST,

yielded a tree with 29 nodes, 15 terminal nodes and a depth of 5 (15, 8 and 4 respectively with

pruning). It used 18 variables in the model. All methods had “Average outnet frequency” as

the primary predictor, except QUEST which had “Customer age” as the primary predictor.

Exhaustive CHAID had the highest overall classification percentage of 75.4% which is

quite higher than the highest overall percentage for the post-paid sample of 67.6%. CHAID

had a value of 75.3%. CRT had a slightly lower value of 73.9% and QUEST had the lowest

percentage of 70.6%. Table 5-23 shows the four methods’ risk estimate and it illustrates that

Exhaustive CHAID had the lowest estimate of .244 indicating that the models misclassified in

24.4% of the cases.

Table 5-23: Risk estimates of different growing methods for the pre-paid training sample Method Estimate Standard error

CHAID .247 .006

ExCHAID .244 .006

CRT .256 .006

QUEST .289 .006

Based on these results and the fact that Exhaustive CHAID had a correct prediction of

83.6% for churn but CHAID an 82.5%, Exhaustive CHAID was chosen as the best method in

this instance. The tree diagram is showed in Figure II-5 in Appendix II, but this is a pruned

tree with the CRT method as the tree diagram with the Exhaustive CHAID is very large and

more complicated to read through. The classification is showed in Table 5-24. The sensitivity

was, as already stated, 83.6% and the specificity was 67.7%. False positives were 28.1% and

false negatives were 19.7%.

Table 5-24: Classification table for the unpruned decision tree for pre-paid training samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

2134

522

80.3%

939

2400

71.9%

67.7%

83.6%

75.6% aGrowing method: Exhaustive CHAID

Note: Sensitivity = 2400/(2400+522) = 82.1%. Specificity = 2134/(2134+939) = 69.4%. False positive =

939/(939+2400) = 28.1%. False negative = 522/(522+2134) = 19.7%. Positive prediction = 2400/(2400+939) = 71.9%.

Negative prediction = 2134/(2134+522) = 80.3%. Overall accuracy = (2134+2400)/(2134+522+939+2400) = 75.6%.

66

The tree table is shown in Table II-20 in Appendix II. As can be seen, this table is

quite larger than for the post-paid training sample, and as in that case, the tree could be pruned

with the CRT growing method. Table II-21 in Appendix II show the pruned tree table and

Table 5-25 shows the overall classification with the pruned tree model. The overall correct

classification is now 73.9% and the sensitivity is 79.2% and specificity is 68.8%

Table 5-25: Classification table for the pruned decision tree for pre-paid training samplea

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

2115

607

77.7%

958

2315

70.7%

68.8%

79.2%

73.9% aGrowing method: CRT

Note: Sensitivity = 2315/(2315+607) = 79.2%. Specificity = 2115/(2115+958) = 68.8%. False positive =

958/(958+2115) = 31.2%. False negative = 607/(607+2315) = 20.8%. Positive prediction = 2315/(2315+958) = 70.7%.

Negative prediction = 2115/(2115+607) = 77.7%. Overall accuracy = (2115+2315)/(2115+607+958+2315) = 73.9%.

“Average outnet frequency” was the best predictor for customer status as it is in node

1. The tree diagram with the Exhaustive CHAID method was very large and difficult to

follow so a pruned tree with CRT growing method is shown in Figure II-5 in Appendix II.

This diagram is much clearer and easier to follow. For customers who made more than 182.75

calls outside the telecom’s network, only 1.9% of them churned. This corresponds with the

results from the logistic regression. Slightly more than half of those who made less than

182.75 outside calls churned. Of those in the latter category, those who had “Average totalin

frequency” lower than 8.42, 26.7% churned. Of those with higher “Average totalin

frequency” than 8.42, 66.6% churned which is the same effect as with the logistic regression.

The last predictor in the tree was “Customer age”. 70.7% of those who are 55.5 years or

younger churn and 37.1% of those who are older than 55.5 years. This also corresponds with

the results from the logistic regression.

The gains chart shown in Figure II-6 in Appendix II illustrates that the model is a

moderately good one.

The ROC curve

The ROC curve for the decision tree is shown in Figure 5-4 on the next page. AUC for the

decision tree = .834 with a 95% confidence interval (.824, .844) and a p-value = .001.

67

--- predicted probability

--- reference line

Figure 5-4: ROC curve for the decision tree for the pre-paid training sample

5.1.5.2 Validation

To validate the decision tree model that resulted from the training data set, the pruned tree

diagram (in Figure II-5 in Appendix II) was used to make a prediction for the cases in the

testing sample for which SPSS produced scores based on the tree model. The results are

shown in the classification table in Table 5-26. As the table shows, the sensitivity was lower

using the decision tree than the logistic regression, or 77.5%. The specificity however was

higher than with the logistic regression, or 55.1%. The false positive rate was 44.9% and the

false negative rate was 22.5%. The overall accuracy was only 56.5% which is lower than with

the training sample which had 73.9% and not much better than a model based only on chance

which has an overall accuracy of 50%. The decision tree does slightly better than the logistic

regression which had an overall accuracy of 53.6%.

Table 5-26: Classification table for the decision tree in the pre-paid testing sample a

Observed

Predicted

Status after 5 months Percentage

correct Censoring Churn

Status after Censoring

5 months Churn

Overall percentage

8197

229

97.3%

6693

787

10.5%

55.1%

77.5%

56.5% aThe cut value is .500

Note: Sensitivity = 787/(787+229) = 77.5%. Specificity = 8197/(8197+6693) = 55.1%. False positive =

6693/(6693+8197) = 44.9%. False negative = 229/(229+787) = 22.5%. Positive prediction = 787/(787+6693) = 10.5%.

Negative prediction = 8197/(8197+229) = 97.3%. Overall accuracy = (8197+787)/(8197+229+6693+787) = 56.5%.

68

Based on the results from both classification methods on the testing sample, logistic

regression is more preferable. It does have a slightly lower sensitivity than the decision tree,

but it does have a higher sensitivity, therefore being better at predicting churn.

5.3 Hypotheses

Regarding the hypotheses stated in Section 2.2.3.1, the results of the logistic regression and

the decision tree show that for:

H1: Customer tenure

(a) Post-paid customers: As tenure was significant in both the logistic regression model

and the decision tree for the post-paid training sample, it has an effect on churn

probability. It had a negative “B” value in the logistic regression model and with the

decision tree, almost 70% of those with lower tenure had churned, therefore as a

customer stays longer with the telecom, the less likely they are to churn. Hence this

hypothesis is supported.

(b) Pre-paid customers: Tenure was also significant in the pre-paid sample in both the

logistic regression and the unpruned decision tree. It had a negative “B” value in the

logistic regression so it does have a negative effect on churn. This is also the case with

the decision tree as those who have been with the telecom for a shorter time are more

likely to churn. This part of the hypothesis is supported.

H2: Level of usage

(a) Post-paid customers: The usage factors did have some effect on customer churn in the

post-paid sample as the variables “Average abroad total charge ratio” and “Maximum

text messages outnet total charge ratio” along with components 1, 2 and 3 which are

composed of usage and charges inside and outside the telecom’s network and abroad,

were significant. These variables had a positive “B” value so as their values increase,

the customers are more likely to churn. For “Total charge groups” used in the logistic

regression, heavy users were more likely to churn. “Average text messages in” was

significant in the decision tree and more messages received was related to churn. Thus

this hypothesis is supported. However, component 4 (related to usage ratios abroad)

had a negative “B” value in the logistic regression meaning that as it increases, the

customer is less likely to churn. Consequently, these hypotheses are partly supported.

69

(b) Pre-paid customers: The same holds for influence of usage and charges on customer

churn in the pre-paid sample. Four independent variables were significant along with

six components which contain items of both usage and charges. However three of

these variables have a negative “B” value thus having a negative effect on churn. As

they increase by one unit, likelihood of churn decreases. For the remaining variables,

the “B” value was positive meaning that as they increase by one unit, the likelihood of

customer churn increases. For “Average outnet frequency” in the decision tree, the

more customers call outside the network, the less likely they are to churn, which is

opposite to the hypothesis. However, as the “Average totalin frequency” increases,

customers are more likely to churn. As a result, this hypothesis is partly supported.

H3: Customer age

(a) Post-paid customers: Customer age was significant in both classification models and

the “B” value in the logistic regression was negative, meaning that as the customers

get older, they are less likely to churn. The same holds for the decision tree and thus

this hypothesis is supported.

(b) Pre-paid customers: Customer age was also significant in the pre-paid sample in both

the logistic regression and the decision tree and had a negative “B” value in the

logistic regression and in the decision tree, those in the category with lower age are

more likely to churn so this hypothesis is also supported.

5.4 CLV calculations

In this section, model equation (2-1) was used to calculate the customer lifetime value for

each customer. The margin in the formula is the same as ARPU (see Section 2.2.1.1) and the

discount rate was discussed in Section 2.2.2.2 (see equation (2-2)). The third and last element

of the CLV model is the retention rate or 1-customer churn. The predicted probabilities for

churn for post- and pre-paid customers based on the logistic regression resulting from the

churn analysis in Sections 5.1 and 5.2 were used.

5.4.1 Segmentation

The customers in both the post-paid and pre-paid samples were segmented by ranking the

cases into 10 equal deciles based on the CLV values. The purpose of doing this is that

Telecom X can see what characterizes the customers in the top 10% decile from the other

70

deciles since these would be the customers who are most valuable to the company. When

these valuable customers are related with a high probability of churn, the company could try

to act on it by offering them something that might reduce the probability of churn. Telecom X

can also look at the other segments and see for example which customers have low CLV and

high probability of churn and then know that it would not be worth it spending marketing

resources trying to retain them.

Post-paid customers

The means for the CLV in the ten different segments are showed in Table II-22 in Appendix

II which is the result of One-way ANOVA. Segment 10 has the top 10% most valuable

customers and segment 1 has the bottom 10% least valuable customers. The ANOVA test (F

= 5503.97, p = .000) implies that there is a difference in the mean of CLV and the different

segments (see Table II-23 in Appendix II). There is a statistically significant difference

between all the segments except segment 1 and 2.

To find out if there was a difference between the ten segments and the demographic

characteristics, crosstables and One-Way ANOVA were conducted. This revealed a

statistically significant difference between the ten segments and all the demographic

variables. However the differences were not that large between segments. The customers in

the highest segments also have the highest average tenure. Overall this segmentation reveals

that there is a clear difference between segments 1 and 10 but the difference in the segments

in between these two is not as clear.

Pre-paid customers

The same procedure was used for the combined sample of pre-paid customers. However, only

nine ranks or segments were created as 22.9% of the sample had a CLV of 0.00.

Crosstabs and One-Way ANOVA were again conducted to see if there was a

difference between the ten segments and any of the demographic. All the tests were

statistically significant so there is a difference between the segments as with the post-paid

sample. However the difference is not always big. Customers in the lower segments are

younger than those in the higher segments. The higher segments also make more frequent

refills and for higher amounts. Finally, the customers in the higher segments have been with

the telecom longer than those in the lower segments. The same holds for this segmentation as

with the combined post-paid sample, the difference between the segments in between

segments 2 and 10 are not very clear.

71

5.5 Summary

In this chapter, the results of the thesis have been presented and discussed in detail.

Logistic regression and decision tree are two classification methods that have been

used frequently in research regarding churn. They are generally considered reliable and give

good prediction. Here, they have both been used and their results compared, for post-paid

customers and pre-paid customers.

There are a couple of factors which influence customer churn, both demographic and

the ones related to customer usage of mobile phone and the charges for that usage. The

influence has both a positive and negative effect. As one can imagine beforehand, as usage

increases, the charges increase and that has an increasing effect on churn. Age and tenure

however have a negative effect, so as they increase, a customer is less likely to churn. One

can conclude that older customers are more loyal to their telecom but younger customers

could also be more prone to changes and follow novelties and better deals. Tenure is also

associated with loyalty, those who have been with the telecom for a long time, are less likely

to churn. They clearly like what they are getting at the telecom but it could also mean that if

these customers buy other products and services at Telecom X it might be more difficult for

them to churn. Of the total of 33116 customers in the combined post-paid sample, 3017

(9.1%) had churned during the five month performance period while among the 21901

customers in the combined pre-paid sample, 3938 (18.0%) had churned. This is a large

difference which could be due to the fact that it is easier to cancel a pre-paid subscription than

a post-paid subscription.

For the post-paid training sample, both classification methods had almost the same

result in overall classification as the logistic regression had a 67.1% and the decision tree had

67.0% (pruned tree). The decision tree did better at predicting churn as it predicted correct in

74.2% of the cases while the logistic regression in 71.4% in the training sample. The

classifiers did better however with the pre-paid training sample. The logistic regression had a

75.9% overall classification correct and the decision tree a 75.4%, so there was not much

difference between the two methods. The decision tree did better again with predicting churn,

with a correct prediction in 82.5% of the cases but the logistic regression had 79.7% correct.

Customer lifetime value was calculated for the post-paid and pre-paid samples and it

showed that it can be a quite straightforward procedure, the most complicated element of it

however is predicting churn. There was a statistically significant difference between some of

72

the independent variables and CLV and it can be helpful to see where the difference lies. The

samples were then segmented based on the CLVs and the top 10% and bottom 10% deciles

were described to show the difference and that the segmentation can be used to identify

valuable customers in danger of churning.

73

6. Conclusion and recommendations

The results of the analysis for churn are in accordance with the literature related to churn as

the hypotheses were for the most part supported. For the post-paid customers, “Customer

age”, “Tenure” and “Land area” were the predictors chosen by both classification methods

indicating the importance of these predictors. If the information from the unpruned decision

tree is taken into account, components 2 and 3 are also chosen in both methods revealing that

usage and charge ratios both inside and outside the telecom’s network have an effect on churn

probability. For the pre-paid customers, “Customer age” along with “Average outnet

frequency” and “Average totalin frequency” were chosen by both classification methods.

However if the unpruned decision tree is used, “Tenure”, “Family size”, “Marital status”,

“Average refill amount”, “Maximum outnet frequency” and components 5 and 7 were chosen

by both methods.

The overall accuracy was not very high for either method, or between 53.6% and

60.4%. The decision tree had the highest overall accuracy of 60.4% for the post-paid sample.

It is not easy deciding which method is better when predicting churn as both methods have

their own advantages like effect size with the logistic regression and tree diagram with the

decision tree.

6.1 Recommendations

The most valuable customers at Telecom X can be identified by their high CLV. Then by

combining CLV with the probability of churn, those customers with the highest CLV and a

high probability of churn should be targeted with tailor-made solutions for the purpose of

retaining the customers. It is easy to see what characterizes them, like the place of residence,

tenure, gender and usage of the mobile phone service which helps providing a solution that

would better suit the individual needs. Customers with low or medium CLV and low

probability of churn could also be considered being targeted in order to increase their CLV

but at the same time striving at keeping their churn probability low. Those customers with low

CLV and high probability of churn should be disregarded as it is not feasible to try to retain

every customer, as resources for marketing are limited.

The people that Telecom X should focus on would for example be heavy users with a

post-paid subscription because they have higher CLVs than light users and have much higher

likelihood of churn. Those who are single have the highest probability of churn and the

74

highest CLV of the three groups for family size. This is reflected by the fact that unmarried

and those who are separated also have the highest CLV and a somewhat higher probability of

churn.

CLV can also be used to segment the customer database which produces segments or

groups of people with similar characteristics and CLVs. Telecom X can use this information

to see which segments are most profitable and which the least profitable. This makes it

possible to make a product or service that would be suitable for a group of people with similar

needs.

With respect to the research questions in Section 1.2, these conclusions show that

CLV can be useful for a mobile phone provider as it can show which customers are most

profitable and which ones are least profitable. The factors influencing customer churn

probability, and therefore also CLV, were elaborated on in the results chapter.

6.2 Limitations and future research

Limitations of this research are specified in this section and future research in this field

proposed.

One of the limitations to this research was that the overall accuracy of the

classification methods was low. The goal of future research should be to increase this for

example by comparing these results to other classification methods. In this analysis, only one

model for calculating CLV was used. Just as using different various classification methods to

predict churn would be preferable to see which one is the best method, using different models

to calculate CLV would be advisable for comparison. Different models can also give

distinctive insights into the determinants of CLV.

One of the disadvantages of the pre-paid dataset is that there is large proportion of

missing values. This makes any analysis more difficult and it is never an easy task to decide

what should be done with missing values. One option is to fill in these missing values if

possible but that can be very complicated, another option is to leave them out as was done in

this research. Finally, there was limited information about ARPU. There was no knowledge

about cost related to customers so this should be added to the ARPU values in future research.

Future research should also strive to gather and use more information about the

customers, like the number of times a customer has churned and returned, the number of times

a customer has contacted information service, needed help or filed a complaint as this gives

75

an idea about customer satisfaction. Information about competitor’s advertising campaigns

and other activities should be gathered to find out if it influences churn, as it in turn has an

effect on CLV. One aspect that is also gaining more attention is customer network. This

means customer’s friends and family, coworkers, acquaintances etc. These people have a huge

influence on a person and many customers change their telecom provider just to follow

someone in their network, for example because of lower rates or convenience. There were

many missing values for the demographic variables in the pre-paid sample since customers

with a pre-paid subscription do not need to give up any information, a separate analysis on

these customers could be carried out to see if this group behaves much different from those

who did give up demographic information.

76

References

Abe, Makoto (2009), “Counting Your Customers” One by One: A Hierarchical Bayes Extension to the

Pareto/NBD Model, Marketing Science, 28 (3), 541-553.

Agresti, Alan and Barbara Finley (1997), “Statistical methods for the social sciences,” 3rd

edition,

Prentice Hall, Inc. Upper Saddle River, NJ.

Ahn, Jae-Hyeon, Sang-Pil Han, and Yung-Seop Lee (2006), “Customer churn analysis: Churn

determinants and mediation effects of partial defection in the Korean mobile telecommunications

service industry,” Telecommunications Policy, 30 (11/12), 552-568.

Allison, Paul D. (1999), “Logistic Regression using the SAS System: Theory and Application,” Cary,

NC, USA: SAS Institute Inc. 302 p.

Batislam, Emine Persentili, Meltem Denziel, and Alpay Filiztekin (2007), “Empirical validation and

comparison of models for customer base analysis,” International Journal of Research in Marketing,

24 (3), 201-209.

Berger, Paul D., and Nada I. Nasr (1998), “Customer Lifetime Value: Marketing Models and

Applications,” Journal of Interactive Marketing, 12(1), 17-29.

Blattberg, Robert C., Gary Getz and Jacquelyn S. Thomas (2001), “Customer Equity: Building and

Managing Relationships as Valuable Assets,” Harvard Business School Publishing Corporation, USA.

Bolton, Ruth N. (1998), “A Dynamic Model of the Duration of the Customer‘s Relationship with a

Continuous Service provider: The Role of Satisfaction,” Marketing Science, 17(1), 45-65.

Borle, Sharad, Siddharth S. Singh and Dipak C. Jain (2008), “Customer Lifetime Measurement,”

Management Science, 54(1), 100-112.

Bradley, Andrew P. (1997), “The use of the area under the ROC curve in the evaluation of machine

learning algorithms”, Pattern Recognition, 30(7), 1145-1159.

Brealey, Richard A., Stewart C. Myers, and Alan J. Marcus (2004), “Fundamentals of Corporate

Finance,” McGraw-Hill, 736p.

Coussement, Kristof and Dirk Van den Poel (2008), “Churn prediction in subscription services: An

application of support vector machines while comparing two parameter-selection techniques, “ Expert

systems with Applications, 34, 313-327.

Dahr, Ravi, and Rashi Glazer (2003), “Hedging Customers,” Harvard Business Review, 81 (5), 86-92.

Davenport, Thomas H. (2006), “Competing on Analytics,” Harvard Business Review, January 2006,

10 p.

Deshpande, Bala (2011), “How to evaluate classification models for business analytics – part 2,”

available at: http://www.simafore.com/blog/bid/57470/How-to-evaluate-classification-models-for-

business-analytics-Part-2 (accessed: 16 June 2013).

Edvardsson, Bo, Michael D. Johnson, Anders Gustafsson, and Tore Strandvik (2000), “The effects of

satisfaction and loyalty on profits and growth: Products versus services,” Total Quality Management,

11 (7), 917-927.

77

Eiben, E, T.J. Euverman, W. Kowalczyck, and F. Slisser (1999), “Modelling Customer Retention with

Statistical Techniques, Rough Data Models and Genetic Programming,” Rough Fuzzy Hybridization. A

New Trend in Decision Making, 330-345:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.55.7177&rep=rep1&type=pdf (Accessed

May 4 2011).

Eliassen, Kjell A., and Johan From (2007), “The privatisation of European telecommunications,”

Hampshire: Ashgate Publishing Limited. 297 p.

Eshghi, Abdolreza, Dominique Haughton, and Heikki Topi (2007), “Determinants of customer loyalty

in the wireless telecommunications industry,” Telecommunications Policy, 31, 93-106.

Fawcett, Tom (2006), “An introduction to ROC analysis”, Pattern Recognition Letters, 27(8), 861-

874.

Field, Andy (2009), “Discovering statistics using SPSS (and sex and drugs and rock’n’roll)” 3rd

edition, SAGE Publications Ltd. London. 822 p.

Franklin, Scott B., David J. Gibson, Philip A. Robertson, John T. Pohlmann and James S. Fralish

(1995), “Parallel Analysis: a Method for Determining Significant Principal Components,” Journal of

Vegetation Science, 6(1), 99-106.

Gerpott, Torsten J., Wolfgang Rams, and Andreas Schindler (2001), “Customer retention, loyalty, and

satisfaction in the German mobile cellular telecommunications market,” Telecommunications Policy,

25 (4), 249-269.

Gupta, Sunil, Dominique Hanssens, Bruce Hardie, Wiliam Kahn, V. Kumar, Nathaniel Lin, Nalini

Ravishanker, and S. Sriram (2006), “Modeling Customer Lifetime Value,” Journal of Service

Research, 9(2), 139-155.

Gupta, Sunil, and Donald R. Lehmann (2003), “Customers as Assets,” Journal of Interactive

Marketing, 17(1), 10-24.

Hanssens, Dominique M., Daniel Thorpe, and Carl Finkbeiner (2008), “Marketing when customer

equity matters,” Harvard Business Review, 86 (5), 117-123.

Hu, Juanli, Jiabin Deng and Mingxiang Sui (2009), “A New Approach for Decision Tree Based on

Principal Component Analysis,” Computational Intelligence and Software Engineering, 4 p.

Hwang, Hyunseok, Taesoo Jung, and Euiho Suh (2004), “An LTV model and customer segmentation

based on customer value: a case study on the wireless telecommunication industry,” Expert Systems

with Applications, 26, 181-188.

Jain, Dipak, and Siddhartha S. Singh (2002), “Customer Lifetime Value Research in Marketing: A

Review and Future Directions,” Journal of Interactive Marketing, 16(2), 34-45.

Kim, Hee-Su, and Choong-Han Yoon (2004), “Determinants of subscriber churn and customer loyalty

in the Korean mobile telephony market,” Telecommunications Policy, 28, 751- 756.

Kim, Moon-Koo, Myeong-Cheol Park, and Dong-Heon Jeong (2004), “The Effects of Customer

Satisfaction and Switching Barrier on Customer Loyalty in Korean Mobile Telecommunication

Services,” Telecommunications Policy, 28 (2), 145-159.

Kumar, V., and Denish Shah (2009), “Expanding the Role of Marketing: From Customer Equity to

Market Capitalization,” Journal of Marketing, 73 (6), 119-136.

78

Kumar, V., Girish Ramani, and Timothy Bohling (2004), “Customer Lifetime Value Approaches and

Best Practice Applications,” Journal of Interactive Marketing, 18(3), 60-72.

Kumar, V. and J. Andrew Petersen and Robert P. Leone (2007), “How valuable is word of mouth?”

Harvard Business Review, October, 1-9.

Kumar, V., and Werner J. Reinartz (2006), Customer Relationship Management: A Databased

Approach, John Wiley & Sons, Inc. 323 p.

Ledesma, Rubén Daniel and Pedro Valero-Mora (2007), “Determining the Number of Factors to

Retain in EFA: an-easy-to-use computer program for carrying out Parallel Analysis,” Practical

Assessment, Research and Evaluation, 12(2), 11 p.

Loh, Wei-Yin (2011), “Classification and regression trees,” WIREs Data Mining and Knowledge

Discovery, 1, 14-23.

Lu, Junxiang (2003), “Modeling Customer Lifetime Value Using Survival Analysis – An Application

in the Telecommunications Industry,” Proceedings of the SAS Conference.

McCloughan, Patrick and Sean Lyons (2006), “Accounting for ARPU: New evidence from

international panel data,” Telecommunications Policy, 30 (10-11) 521-532.

Menard, Scott (2001), “Applied Logistic Regression Analysis” 2nd

edition, Sage University Papers

Series on Quantitative Applications in the Social Sciences, 07-106. Thousand Oaks, CA. 111 p.

Neslin, Scott A., Sunil Gupta, Wagner Kamakura, Junxian Lu, and Charlotte Mason (2006),

“Defection Detection: Improving Predictive Accuracy of Customer Churn Models,” Journal of

Marketing, 43(2), 204-211.

Nie, Guangli, Wei Rowe, Lingling Zhang, Yingjie Tian and Yong Shi (2011), “Credit card churn

forecasting by logistic regression and decision tree,” Expert Systems with Applications, 38(12), 15273-

15285.

Niraj, Rakesh, Mahendra Gupta, and Chakravarthi Narasimhan (2001), “Customer Profitability in a

Supplier Chain,” Journal of Marketing, 65 (July), 1-16.

Novo, Jim (2004), “Drilling Down: Turning Customer Data into Profits with a Spreadsheet,” 3rd

ed.

Booklocker Inc. 356p.

Peng, Chao-Ying Joanne, Kuk Lida Lee and Gary M. Ingersoll (2002), “An Introduction to Logistic

Regression Analysis and Reporting,” The Journal of Educational Research, 96(1), 3-14.

Pett, Marjorie, A., Nancy R. Lackey and John J. Sullivan (2003), “Making Sense of Factor Analysis.

The Use of Factor Analysis for Instrument Development in Health Care Research 1st edition,” Sage

Publications. Thousand Oaks, California. 368 p.

Pine II, Joseph J., Don Peppers, and Martha Rogers (1995), “Do you want to keep your Customers

Forever?” Harvard Business Review, 73 (2), 103-114.

Piramuthu, Selwyn (2008), “Input data for decision trees,” Expert Systems with Applications, 34,

1220-1226.

Polit, Denise F. (2010), “Statistics and data analysis for nursing research” 2nd

edition. Pearson

Education. Upper Saddle River, New Jersey. 442 p.

79

Post- and Telecom Administration (2010), “Statistics on the Icelandic electronic communications

market 2010,” http://pfs.is/upload/files/Tölfræðiskýrsla_PFS_2008%20-%202010.pdf (Accessed

March 16 2011).

Post- and Telecom Administration (2012), “Statistics on the Icelandic electronic communications

market 2012,” http://www.pfs.is/upload/files/Tolfraediskyrsla_PFS_

Isl.fjarskiptamarkadur_2010_til_2012.pdf (Accessed August 20 2013).

Reichheld, Frederick F (1996), “Learning from Customer Defections,” Harvard Business Review,

March-April, 58-69.

Reinartz, Werner J. and V. Kumar (2000), “On the Profitability on Long-life Customers in a

Noncontractual Setting: An Empirical Investigation and Implications for Marketing,” Journal of

Marketing, 64(4), 17-35.

Reinartz, Werner J. and V. Kumar (2003), “The Impact of Customer Relationship Characteristics on

Profitable Lifetime Duration,” Journal of Marketing, 67, 77-99.

Rigby, Darrell K., Frederick F. Reichheld, and Phil Schefter (2002), “Avoid the Four Perils of CRM,”

Harvard Business Review, 80 (2) 101-109.

Risselada, Hans, Peter C. Verhoef, and Tammo H. A. Bijmolt (2010), “Staying power of churn

prediction models,” Journal of Interactive Marketing, 24(3), 198-208.

Roofthooft, Ward (2010), “Customer Equity: A Creative Tool for SMEs in the Service Industry – How

Small and Medium Enterprises can win the Battle for Innovation,” Service Business, 4, 37-48.

Rosset, Saharon, Einat Neumann, Uri Eick, and Nurit Vatnik (2003), “Customer Lifetime Value

Models for Decision Support,” Data Mining and Knowledge Discovery, 7, 321-339.

Rust, Roland T. and Richard Metters (1996) “Invited Review: Mathematical models of service”,

European Journal of Operational Research. 91. 427-439.

Rust, Roland T., Valerie Zeithaml, and Katherine N. Lemon (2000), Driving Customer Equity: How

Customer Lifetime Value is Reshaping Corporate Strategy, New York: The Free Press.

Rust, Roland T., Katherine N. Lemon, and Valerie A. Zeithaml (2004), “Return on Marketing: Using

Customer Equity to Focus Marketing Strategy,” Journal of Marketing, 68 (1), 109-127.

Rust, Roland T., Christine Moorman, and Gaurav Bhalla (2010), “Rethinking Marketing,” Harvard

Business Review, 88 (1/2), 94-101.

Ryals, Lynette (2002), “Are your Customers worth more than Money?,” Journal of Retailing and

Consumer Services, 9(5), 241-251.

Ryals, Lynette, and Simon Knox (2007), “Measuring and managing customer relationship risk in

business markets,” Industrial Marketing Management, 36 (6), 823-833.

Schweidel, David A., Eric T. Bradlow, and Peter S. Fader (2011), “Portfolio Dynamics for Customers

of a Multiservice Provider,” Management Science, 57 (3), 471-486.

Statistics Iceland (2011), “Population, Overview”,

http://www.statice.is/?PageID=1170&src=/temp_en/ Dialog/varval.asp?ma= MAN00000

%26ti=Population+-+key+figures+17032011++++++%26path=../Database/ mannfjoldi/

Yfirlit/%26lang=1%26units=Number (Accessed March 16 2011).

80

Tabachnick, Barbara G. and Linda S. Fidell (2001), “Using Multivariate Statistics” 4th edition.

International student edition. Pearson Education Company. Allyn & Bacon. MA, USA. 966 p.

Wei, Chih-Ping, and I-Tang Chiu (2002), “Turning Telecommunications Call Details to Churn

Prediction: A Data Mining Approach,” Expert Systems with Applications, 23, 103-112.

Wheaton, Philip (2000), “The Lifecycle View of Customers,” U.S. Banker, June, 110, 77-78.

Winer, Russell S. (2001), “A Framework for Customer Relationship Management,” California

Management Review, 43(4), 89-105.

Witten, Ian H., and Eiben Frank (2005), “Data Mining: Practical Machine Learning Tools and

Techniques,” 2nd

Ed. Morgan Kaufmann. San Francisco. 560 p.

Xie, Yaya, Xiu Li, E.W.T. Ngai and Weiyun Ying (2009) “Customer churn prediction using improved

balanced random forests,” Expert Systems with Applications 36, 5445-5449.

Appendix I

Table II-1: Independent variables in the churn analysis for post-paid and pre-paid subscribers

Variable name Description Group

status Status can be churn or censoring (dependent variable) Demographics

customer_age Customer's age Demographics

family_size Family size Demographics

gender Gender Demographics

land_area Land area Demographics

marital_status Marital status Demographics

rateplan Rate plan Demographics

ispayer Customer is the payer for his own service account or not Demographics

tenure Tenure (how long customer has been in this status) Demographics

total charge groups Customer is either in high usage or low usage group based on total charge Demographics

avg_num_service Average number of billed services over the three months of data extraction Billing data

avg_num_product Average number of billed products over the three months of data extraction Billing data

avg_amount_gsm Average billed amount due to GSM usage over the three months of data extraction Billing data

avg_amount_discount Average discount amount over the three months of data extraction Billing data

avg_ratio_gsm Average ratio of GSM usage to total billed amount over the three months of data extraction Billing data

avg_ratio_discount Average ratio of discount to total billed amount over the three months of data extraction Billing data

avg_mysum Average total billed amount over the three months of data extraction Billing data

max_refill_freq The maximum refill frequency in a month over the three months of data extraction Refill history

max_refill_amount The maximum refill amount in a month over the three months of data extraction Refill history

avg_refill_freq Average refill frequency over the three months of data extraction Refill history

avg_refill_amount Average monthly refill amount over the three months of data extraction Refill history

avg_innet_vol Average monthly inside network call volume over the three months of data extraction Calling pattern

82

max_innet_vol The maximum inside network call volume in a month over the three months of data extraction Calling pattern

avg_innet_freq Average monthly inside network call frequency over the three months of data extraction Calling pattern

max_innet_freq The maximum inside network call frequency in a month over the three months of data extraction Calling pattern

avg_outnet_vol Average monthly outside network call volume over the three months of data extraction Calling pattern

max_outnet_vol The maximum outside network call volume in a month over the three months of data extraction Calling pattern

avg_outnet_freq Average monthly outside network call frequency over the three months of data extraction Calling pattern

max_outnet_freq The maximum outside network call frequency in a month over the three months of data extraction Calling pattern

avg_abroad_vol Average monthly abroad call volume over the three months of data extraction Calling pattern

max_abroad_vol The maximum abroad call volume in a month over the three months of data extraction Calling pattern

avg_abroad_freq Average monthly abroad call frequency over the three months of data extraction Calling pattern

max_abroad_freq The maximum abroad call frequency in a month over the three months of data extraction Calling pattern

avg_innet_vol_ratio Average ratio of inside network to total originating call volume over the three months of data extraction Calling pattern

max_innet_vol_ratio The maximum inside network to total call volume ratio in a month over the three months of data extraction Calling pattern

avg_innet_freq_ratio Average ratio of inside network to total originating call frequency over the three months of data extraction Calling pattern

max_innet_freq_ratio The maximum inside network to total call frequency ratio in a month over the three months of data extraction Calling pattern

avg_outnet_vol_ratio Average ratio of outside network to total originating call volume over the three months of data extraction Calling pattern

max_outnet_vol_ratio The maximum outside network to total call volume ratio in a month over the three months of data extraction Calling pattern

avg_outnet_freq_ratio Average ratio of outside network to total originating call frequency over the three months of data extraction Calling pattern

max_outnet_freq_ratio The maximum outside network to total call frequency ratio in a month over the three months of data extraction Calling pattern

avg_abroad_vol_ratio Average ratio of abroad to total originating call volume over the three months of data extraction Calling pattern

max_abroad_vol_ratio The maximum abroad to total call volume ratio in a month over the three months of data extraction Calling pattern

avg_abroad_freq_ratio Average ratio of abroad to total originating call frequency over the three months of data extraction Calling pattern

max_abroad_freq_ratio The maximum abroad to total call frequency ratio in a month over the three months of data extraction Calling pattern

avg_voice_outin_vol_ratio Average ratio of originating to terminating call volume over the three months of data extraction Calling pattern

max_voice_outin_vol_ratio The maximum originating to terminating call volume ratio in a month over the three months of data extraction Calling pattern

avg_sms_outin_ratio Average ratio of sending to receiving SMS frequency over the three months of data extraction Calling pattern

max_sms_outin_ratio The maximum sending to receiving SMS frequency ratio in a month over the three months of data extraction Calling pattern

avg_totalout_vol Average monthly total originating call volume over the three months of data extraction Calling pattern

max_totalout_vol The maximum total originating call volume in a month over the three months of data extraction Calling pattern

83

avg_totalout_freq Average monthly total inside network call frequency over the three months of data extraction Calling pattern

max_totalout_freq The maximum total originating call frequency in a month over the three months of data extraction Calling pattern

avg_totalin_vol Average monthly total terminating call volume over the three months of data extraction Calling pattern

max_totalin_vol The maximum total terminating call volume in a month over the three months of data extraction Calling pattern

avg_totalin_freq Average monthly total terminating call frequency over the three months of data extraction Calling pattern

max_totalin_freq The maximum total terminating call frequency in a month over the three months of data extraction Calling pattern

avg_smsout Average monthly sending SMS frequency over the three months of data extraction Calling pattern

max_smsout The maximum sending SMS frequency in a month over the three months of data extraction Calling pattern

avg_smsin Average monthly receiving SMS frequency over the three months of data extraction Calling pattern

max_smsin The maximum receiving SMS frequency in a month over the three months of data extraction Calling pattern

avg_innet_charge Average charged amount due to inside network call over the three months of data extraction cdr billed

avg_outnet_charge Average charged amount due to outside network call over the three months of data extraction cdr billed

avg_abroad_charge Average charged amount due to abroad call over the three months of data extraction cdr billed

avg_innet_tcharge_rat Average ratio of inside network call to total charged amount over the three months of data extraction cdr billed

avg_outnet_tcharge_rat Average ratio of outside network call to total charge amount over the three months of data extraction cdr billed

avg_abroad_tcharge_rat Average ratio of abroad call to total charge amount over the three months of data extraction cdr billed

avg_sms_innet_charge Average charged amount due to sending SMS inside network over the three months of data extraction cdr billed

avg_sms_outnet_charge Average charged amount due to sending SMS outside network over the three months of data extraction cdr billed

avg_sms_abroad_charge Average charged amount due to sending SMS abroad over the three months of data extraction cdr billed

avg_sms_innet_tcharge_rat Average ratio of inside network SMS sending to total charged amount over the three months of data extraction cdr billed

avg_sms_outnet_tcharge_rat Average ratio of outside network SMS sending to total charged amount over the three months of data extraction cdr billed

avg_sms_abroad_tcharge_rat Average ratio of abroad SMS sending to total charged amount over the three months of data extraction cdr billed

avg_tcharge Average total charged amount over the three months of data extraction cdr billed

max_innet_charge The maximum charged amount due to inside network call in a month over the three months of data extraction cdr billed

max_outnet_charge The maximum charged amount due to outside network call in a month over the three months of data extraction cdr billed

max_abroad_charge The maximum charged amount due to abroad call in a month over the three months of data extraction cdr billed

max_innet_tcharge_rat The maximum inside network call to total charged amount ratio in a month over the three months of data extraction cdr billed

max_outnet_tcharge_rat The maximum outside network call to total charged amount ratio in a month over the three months of data extraction cdr billed

max_abroad_tcharge_rat The maximum abroad call to total charged amount ratio in a month over the three months of data extraction cdr billed

84

max_s_innet_charge The maximum charged amount due to sending SMS inside network in a month over the three months of data extraction cdr billed

max_s_outnet_charge The maximum charged amount due to sending SMS outside network in a month over the three months of data extraction cdr billed

max_s_abroad_charge The maximum charged amount due to sending SMS abroad in a month over the three months of data extraction cdr billed

max_s_innet_tcharge_rat The maximum inside network SMS sending to total charged amount ratio in a month over the three months of data extraction cdr billed

max_s_outnet_tcharge_rat The maximum outside network SMS sending to total charged amount ratio in a month over the three months of data extraction cdr billed

max_s_abroad_tcharge_rat The maximum abroad SMS sending to total charged amount ratio in a month over the three months of data extraction cdr billed

max_tcharge The maximum total charged amount in a month over the three months of data extraction cdr billed

85