141
Texas A&M International University Texas A&M International University Research Information Online Research Information Online Theses and Dissertations 6-4-2015 Feature selection in credit scoring- a quadratic programming Feature selection in credit scoring- a quadratic programming approach solving with bisection method based on Tabu search approach solving with bisection method based on Tabu search Jun Huang Follow this and additional works at: https://rio.tamiu.edu/etds Recommended Citation Recommended Citation Huang, Jun, "Feature selection in credit scoring- a quadratic programming approach solving with bisection method based on Tabu search" (2015). Theses and Dissertations. 2. https://rio.tamiu.edu/etds/2 This Dissertation is brought to you for free and open access by Research Information Online. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Research Information Online. For more information, please contact [email protected], [email protected], [email protected], [email protected].

Feature selection in credit scoring- a quadratic

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Feature selection in credit scoring- a quadratic

Texas A&M International University Texas A&M International University

Research Information Online Research Information Online

Theses and Dissertations

6-4-2015

Feature selection in credit scoring- a quadratic programming Feature selection in credit scoring- a quadratic programming

approach solving with bisection method based on Tabu search approach solving with bisection method based on Tabu search

Jun Huang

Follow this and additional works at: https://rio.tamiu.edu/etds

Recommended Citation Recommended Citation Huang, Jun, "Feature selection in credit scoring- a quadratic programming approach solving with bisection method based on Tabu search" (2015). Theses and Dissertations. 2. https://rio.tamiu.edu/etds/2

This Dissertation is brought to you for free and open access by Research Information Online. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Research Information Online. For more information, please contact [email protected], [email protected], [email protected], [email protected].

Page 2: Feature selection in credit scoring- a quadratic

FEATURE SELECTION IN CREDIT SCORING- A QUADRATIC PROGRAMMING

APPROACH SOLVING WITH BISECTION METHOD BASED ON TABU SEARCH

A Dissertation

by

JUN HUANG

Submitted to Texas A&M International University

in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

May 2014

Major Subject: International Business Administration

Page 3: Feature selection in credit scoring- a quadratic

FEATURE SELECTION IN CREDIT SCORING- A QUADRATIC PROGRAMMING

APPROACH SOLVING WITH BISECTION METHOD BASED ON TABU SEARCH

Copyright 2014 Jun Huang

Page 4: Feature selection in credit scoring- a quadratic

FEATURE SELECTION IN CREDIT SCORING- A QUADRATIC PROGRAMMING

APPROACH SOLVING WITH BISECTION METHOD BASED ON TABU SEARCH

A Dissertation

by

JUN HUANG

Submitted to Texas A&M International University

in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Approved as to style and content by:

Chair of Committee, Haibo Wang

Committee Members, Jacqueline R Mayfield

Milton R Mayfield

Runchang Lin

Head of Department, Nereu Florencio Kock

May 2014

Major Subject: International Business Administration

Page 5: Feature selection in credit scoring- a quadratic

iv

ABSTRACT

Feature Selection in Credit Scoring- A Quadratic Programming Approach Solving with Bisection

Method Based On Tabu Search (May 2014)

Jun Huang, Master of Science, Texas A&M International University;

Chair of Committee: Haibo Wang

Credit risk is one of the most important topics in the risk management. Meanwhile, it is the

major risk of banks and financial institutions encountered as claimed by the Basel capital accord.

As a form of credit risk measurement, credit scoring is the credit evaluation process to reduce the

current and expected risk of a customer being bad credit. The credit scoring models usually use a

set of features to predict the credit status, good credit (unlikely to default) and bad credit (more

likely to default), of the applicants. However, with the fast growth in the credit industry and

facilitation of collecting and storing information due to the new technologies, a huge amount of

information on customer is available. Feature selection or subset selection is therefore essential

to handle irrelevant, redundant or misleading features in order to improve predictive

(classification) accuracy and reduce high complexity, intensive computation, and instability for

most credit scoring models.

In this study, a hybrid model is developed for credit scoring problems to predict the

classification accuracy based on selected subsets by first establishing a correlation coefficient

based binary quadratic programming model for feature selection. The model is then solved with

the bisection method based on Tabu search algorithm (BMTS) and provides optional subsets of

features in different sizes from which the satisfactory subsets for credit scoring models are

Page 6: Feature selection in credit scoring- a quadratic

v

selected based on both the size and overall classification accuracy rate (OCAR). The results of

this proposed BMTS+SVM method, tested on two benchmark credit datasets, shed light on the

improvement of the existing credit scoring systems with flexibility and robustness.

This validated method is then used in an international business context to test the data on

the U.S. and Chinese companies in order to find out the subsets of features that act as key factors

in distinguishing good credit companies from bad credit companies in these two countries.

Finally, The performance of classification models, using different classifiers, in terms of OCAR

and misclassification cost is evaluated based on the U.S. and Chinese datasets. Cutoff values

which give highest OCAR and minimum misclassification cost is also discussed.

Page 7: Feature selection in credit scoring- a quadratic

vi

ACKNOWLEDGEMENTS

I would like to first thank Dr. Haibo Wang for his constant guidance, personal attention,

suggestions and endless encouragement and full support during last four and half years of my

graduate study and research. Special thanks go to my committee members Dr. Jacqueline R

Mayfield, Dr. Milton R Mayfield, and Dr. Runchang Lin, for their invaluable advice and

feedback. Also, I would like to express my sincere appreciation to the visiting scholar Dr. Zhibin

Xiong who gave me enormous valuable discussions during my dissertation research.

Finally, I would like to express my utmost gratitude to my family- my parents, younger

sister, and parents-in-law whose unparallel support and constant encouragement helped me sail

through the rigorous journey of the PhD program. I extend my deepest appreciation to my

beloved wife, Weiwei Wu, for her unconditional love, understanding, and inspiration. Her

endless support and encouragement throughout the entire doctorate program contributed greatly

to my success. A true blessing, she is indeed the highly valued significant-other.

Page 8: Feature selection in credit scoring- a quadratic

vii

TABLE OF CONTENTS

Page

ABSTRACT ............................................................................................................................. iv

ACKNOWLEDGMENTS ....................................................................................................... vi

TABLE OF CONTENTS ........................................................................................................ vii

LIST OF TABLES .....................................................................................................................x

LIST OF FIGURES ................................................................................................................ xii

CHAPTER

I INTRODUCTION .........................................................................................................1

Background ....................................................................................................................1

Purpose and Contribution ..............................................................................................3

II LITERATURE REVIEW ..............................................................................................7

Credit Risk Management ...............................................................................................7

Credit Scoring ................................................................................................................8

Discriminant Analysis ......................................................................................12

Logistic Regression ......................................................................................... 14

Decision Trees .................................................................................................15

Neural Networks ..............................................................................................17

Genetic Programming ..................................................................................... 20

Support Vector Machines ................................................................................23

Feature Selection ..........................................................................................................25

III METHODOLOGY ......................................................................................................32

Model Construction .....................................................................................................32

Algorithm .....................................................................................................................34

Page 9: Feature selection in credit scoring- a quadratic

viii

SVM Classifier.............................................................................................................41

Cross Validation...........................................................................................................42

IV EXPERIMENT RESULTS AND COMPARISON ANALYSIS ................................44

Validation of the Method on Two Benchmark Datasets ..............................................44

Results and Comparison Analysis ...............................................................................47

V APPLICATION OF THE CREDIT SCORING AT CORPORATE LEVEL ..............54

Reviews of Applications of Credit Scoring at Corporate Level ..................................54

A Study of Credit Scoring for the U.S. and Chinese Companies ................................58

Model Predictive Performance and Evaluation ...........................................................73

ROC Curve...................................................................................................................75

Misclassification Cost ..................................................................................................79

Identification of Cutoff Value ......................................................................................81

VI CONCLUSION AND DISCUSSION .........................................................................84

Summary ......................................................................................................................84

Discussion and Future Research ..................................................................................86

REFERENCES ........................................................................................................................89

APPENDIX

A EXAMPLE OF SOLUTIONS FOR MODEL 3.1 .....................................................108

B STATISTICAL DESCRIPTION OF THE U.S. DATASET .....................................113

C STATISTICAL DESCRIPTION OF CHINESE DATASET ....................................114

D DEFINITIONS OF LONG TERM CREDIT RATINGS FROM S&P ......................115

E COMPLETE SELECTED SUBSETS AND OCAR FOR THE U.S. DATASET .....116

F COMPLETE SELECTED SUBSETS AND OCAR FOR CHINESE DATASET ....118

Page 10: Feature selection in credit scoring- a quadratic

ix

G SENSITIVITY AND 1-SPECIFICITY FOR THE U.S. DATASET ........................122

H SENSITIVITY AND 1-SPECIFICITY FOR CHINESE DATASET .......................124

VITA ......................................................................................................................................128

Page 11: Feature selection in credit scoring- a quadratic

x

LIST OF TABLES

Page

Table 1: Summary of customer credit scoring models ............................................................11

Table 2: Penalty conversion .....................................................................................................36

Table 3: Statistic description for Australian and German datasets ..........................................45

Table 4: Complete subsets of features associated with given α for Australian dataset ...........45

Table 5: Complete subsets of features associated with given α for German dataset ...............46

Table 6: OCAR for Australian case and comparison ...............................................................48

Table 7: OCAR of selected subsets for Australian case and comparison ................................51

Table 8: OCAR for German case and comparison ..................................................................52

Table 9: OCAR of selected subsets for German case and comparison ....................................52

Table 10: Financial ratios in bankruptcy prediction literatures ...............................................56

Table 11: Financial ratios for the U.S. and Chinese companies ..............................................59

Table 12: Description of the U.S. and Chinese datasets ..........................................................65

Table 13: OCAR for the U.S. dataset and comparison ............................................................67

Table 14: OCAR for Chinese dataset and comparison ............................................................67

Table 15: Comparison of financial ratios between the U.S. dataset and S&P .........................69

Table 16: Comparison of financial ratios between the U.S. and Chinese dataset ...................70

Table 17: ANOVA for features in operating ratios from the U.S. dataset ...............................71

Table 18: ANOVA for features in operating ratios from Chinese dataset ...............................71

Table 19: Description of training and testing data for the U.S. and Chinese datasets .............74

Table 20: OCAR of five classifiers for the U.S. and Chinese datasets ....................................74

Table 21: AUC of different classifiers for the U.S. and Chinese datasets ...............................76

Table 22: OCAR in new cutoff value of five classifiers for the U.S. and Chinese datasets ....78

Page 12: Feature selection in credit scoring- a quadratic

xi

Table 23: Misclassification cost for the U.S. and Chinese datasets.........................................80

Table 24: Misclassification cost with new cutoff values for the U.S. dataset .........................83

Page 13: Feature selection in credit scoring- a quadratic

xii

LIST OF FIGURES

Page

Fig. 1: Dissertation structure ......................................................................................................6

Fig. 2: Relationship between number of research papers and year ..........................................10

Fig. 3: Logistic function P .......................................................................................................15

Fig. 4: An example of a decision tree ......................................................................................16

Fig. 5: An example of a neuron ...............................................................................................18

Fig. 6: An example of a neural network ..................................................................................18

Fig. 7: An example of expression of GP ..................................................................................21

Fig. 8: An example of mutation in GP .....................................................................................21

Fig. 9: An example of crossover in GP ....................................................................................22

Fig. 10: An example of a SVM in the two-dimensional space ................................................24

Fig. 11: Flowchart of filter approaches. ...................................................................................26

Fig. 12: Flowchart of wrapper approaches ..............................................................................29

Fig. 13: Flowchart of the BMTS+SVM method ......................................................................40

Fig. 14: An example of the cross validation ............................................................................42

Fig. 15: Relationship between number of features and α ........................................................47

Fig. 16: Structure of credit scoring study at corporate level ....................................................66

Fig. 17: ROC for the U.S. dataset ............................................................................................77

Fig. 18: ROC for Chinese dataset ............................................................................................77

Page 14: Feature selection in credit scoring- a quadratic

1

This dissertation is modeled on Expert Systems with Applications.

CHAPTER I

INTRODUCTION

1.1 Background

Credit risk is one of the most important topics in the risk management. Meanwhile, it is the

major risk of banks and financial institutions encountered as claimed by the Basel capital accord

(Stephanou & Mendoza, 2005). With the rapid development of credit industry and increasing

complexity of banking activities, various credit risk problems arose. For instance, the growing

defaults from borrowers led the increasing of non-performing assets in banks which may even

cause bankruptcy of banks (World Bank, 2013); bond holders or investor suffer great losses

when a company default since they could not pay interests on time; The credit risk problem even

bear some responsibility for financial catastrophes such as 2008 Global Financial Crisis (Utzig,

2010). Therefore, the development and establishment of credit risk measurement is extremely

important to mitigate the risks.

Historically, financial institutions have relied on loan officers’ experience using technique

such as the 5 Cs to assess credit quality. However with increasing complexity of banking

activities, the qualitative method based on human judgment could not meet the need for credit

risk management. Credit risk measurements dominated by quantitative methods were becoming

increasingly popular. As a form of credit risk measurement, credit scoring is the credit evaluation

process to reduce the current and expected risk of a customer being bad credit in so that losses

due to bad debt can be mitigated (Abdou & Pointon, 2011).

Generally, the credit scoring models applied statistical approaches, and artificial intelligent

approaches (Huang, Chen & Wang, 2007). A main stream of building credit scoring models is to

develop classification models so that based on the analysis of the past performance of

Page 15: Feature selection in credit scoring- a quadratic

2

consumers, future credit applicants can be classified into one of the predefined classes, typically

good class (unlikely to default) or bad class (more likely to default), according to the properties

that describe demographic characteristics, economic or financial conditions of the applicants

(García, Marqués & Sánchez, 2012). A variety of credit scoring models have been developed,

including statistical classification approaches, such as logistic regression, linear discriminant

analysis, factor analysis, and probit regression; and artificial intelligence approaches, such as

expert system, fuzzy algorithm, genetic programming, neural networks, support vector machines,

etc. (Šušteršič, Mramor & Zupan, 2009). The benefits of developing credit scoring models

include reducing the cost of credit analysis, enabling faster credit decisions, better examination

of existing accounts and prioritizing collections (Brill, 1998). For example, a Louisiana bank

called Hibernia Corporation reported that they processed 100 applications per month for small

business lending before implementing credit scoring in 1993 with seven loan officers. By 1995,

the same number of loan officers processed 1,100 applications per month. Also the business

loan portfolio increased from $100 million to $600 million during from 1993 to 1995.

Moreover, fewer bad loans were made by the bank (Lawson, 1995).

With the rapid growth in credit industry and facilitation of collecting and storing information

due to the new technologies, especially, after the rising of e-commerce, a huge amount of

information on customer behavior is available (Wollan, 2008). However, the inclusion of high

dimensional data often leads to high complexity, intensive computation, instability or lack of

predictive accuracy for most classification models (Liu & Schumann, 2005). Feature selection or

subset selection is therefore necessary to reduce the number of features used in order to achieve

better prediction accurately and efficiently.

Page 16: Feature selection in credit scoring- a quadratic

3

In many real world problems, feature selection is also considered as a preprocessing of the

variables before applying other sophistical analysis tools. It is well known that keeping

uninformative variables in the model will cause the increase of variance of the response variable,

and thus, affects the predictive performance of the model. Feature selection can help to improve

the decision making by (1) improving the prediction performance by eliminating uninformative

variables; (2) providing faster and cost-effective predictors, and thus, saves the cost of collecting

data and builds less computational expensive models, (3) providing better understanding of

underlying process and making the model more interpretable (Guyon & Elisseeff, 2003). Hence,

feature selection is very important in building classification models.

1.2 Purpose and Contribution

In this study, a hybrid model for credit scoring problem is developed by establishing a

correlation coefficient based binary quadratic programming model for feature selection in the

first phase. The model is then solved by a bisection method based on Tabu search algorithm

(BMTS) and provides optional subsets of features in different sizes. In the second phase, the

satisfactory subsets for credit scoring models are selected based on both the size (number of

features in a subset) and predictive performance in terms of overall classification accuracy rate

(OCAR) which is derived from 10-fold cross validation Support Vector Machines (SVM). The

presented hybrid model, using BMTS+SVM method, not only reduces the computational effort

by the classifier but also provides flexible options so that a tradeoff between accuracy and the

size of subset is available.

This proposed BMTS+SVM method is validated with two benchmark credit datasets, and can

be applied in determining key factors that provide best discriminating power in identifying good

credit and bad credit customers from a pool of factors. Therefore, an application of the method is

Page 17: Feature selection in credit scoring- a quadratic

4

then illustrated in an international business context on the U.S. and Chinese companies in order

to find out the subsets of features that act as the key factors in differentiating between

creditworthy companies (CWCs), companies that are unlikely to default, and less creditworthy

companies (LCWCs), companies that are more likely to default, in these two countries. The most

useful factors, in terms of financial categories and financial ratios, are first identified for the U.S.

companies. The four financial categories are those with profitability, solvency, cash flow, and

leverage ratios, and are in line with the four financial categories to which the 8 financial ratios,

provided by a widely recognized credit rating agency Standard & Poor, belong. Similarly, we

found the same four financial categories for Chinese companies with an additional financial

category with operating ratios. This indicates that key financial categories discriminated best

between CWCs and LCWCs may vary for different countries. Moreover, the application of the

findings is twofold. On one hand, managers of financial institutions can pay more attention to the

ratios in the key financial categories, especially the most representative ratios selected with our

proposed method, so that the managers are able to gain better understanding about the credit

status of their applicants before making any further decisions. On the other hand, companies that

attempt to borrow money from financial institutions are able to attain clear vision on what are the

most important financial factors for being considered a creditworthy company, and what

improvement are needed immediately to increase the chance of receiving loans.

Finally, the performance of different classification models (models using different classifiers

including support vector machines, discriminant analysis, logistic regression, decision tree, and

neural networks) in terms of OCAR and misclassification cost is evaluated based on the U.S. and

Chinese datasets. Cutoff values which give the highest overall classification accuracy rate and

minimum misclassification cost are also discussed. The results show that SVM has stable and

Page 18: Feature selection in credit scoring- a quadratic

5

slightly better overall performance. However, there is no strong evidence showing that a

particular classifier significantly outperforms the others.

In sum, the contribution of this study is that it develops a hybrid model using BMTS+SVM

method which performs competitively well in predicting the classification accuracy for the credit

scoring problem. The method not only reduces the computational effort by the classifier but also

provides flexible options so that a tradeoff between accuracy and the size of subset is available.

In regard to application, the method is used at corporate level to identify key factors in

differentiating between creditworthy and less creditworthy companies in both the U.S. and China

to provide some insights and guidance for the managers in both financial institutions and

borrowing companies.

The dissertation is organized as follows. In Chapter 2, we give an overview of the related

work of credit risk management, credit scoring models, and feature selection. Chapter 3

introduces the construction of the subset selection model and the way to identify the subsets

using bisection method for different values of parameter, α, based on Tabu search algorithm.

Chapter 4 presents the results of subsets selected with the proposed binary quadratic

programming model and OCAR derived from SVM classifier. These results are also compared

with some classic approaches and the results from other studies. In Chapter 5, an application of

the proposed method is presented in an international business context for the U.S. and Chinese

companies, and the performances of different classification models are evaluated Discussion and

concluding remarks are presented in Chapter 6. The structure of the dissertation is given in Fig.

1.

Page 19: Feature selection in credit scoring- a quadratic

6

Fig. 1. Dissertation structure

Chapter 6:

CONCLUSION AND DISCUSSION

Chapter 1:

INTRODUCTION

Background introduction, purpose and

contribution of the dissertation

Chapter 2:

LITERATURE REVIEW

Reviews on credit risk management,

credit scoring, and feature selection

Chapter 3:

METHODOLOGY

Chapter 4:

RESULTS AND

ANALYSIS

Construction of the model and

validation of BMTS+SVM method

Chapter 5:

APPLICATION

Application of the method at firm

level in an international business

context

Page 20: Feature selection in credit scoring- a quadratic

7

CHAPTER II

LITERATURE REVIEW

2.1 Credit Risk Management

A simple definition of credit risk (also referred as default risk, or counterparty risk) is the

potential that a borrower or counterparty will fail to meet its contractual obligations in

accordance with agreed terms. The objective of credit risk management is to maximize a

financial institution’s risk-adjusted rate of return by maintaining credit risk exposure within

acceptable parameters. For most financial institutions, lending is the largest and most obvious

source of credit risk, and they have encountered difficulties over the past years for a variety of

reasons. A main cause for the difficulties is due to credit standards for borrowers and

counterparties are too lax (Njanike, 2009). Therefore, it is vital for financial institutions to

establish well-defined credit granting criteria to approve credit in a safe and sound manner. In

addition, credit risk assessment was actively promoted by Basel Committee on Banking

Supervision (BCBS) who issued Basel accord, a set of recommendations for regulations in the

banking industry (Stephanou & Mendoza, 2005). The new Basel capital accord (Basel II), which

developed since 1999, proposed a series of new regulatory framework to measure the credit risk.

It focused on a variety of risk identification and measurement methods including the standard

approach and internal rating method (IRB). While the standardized approach allows less

sophisticated banks to use external credit ratings to classify the bank’s assets into risk classes, the

IRB approach was particularly emphasized by the new accord which relies heavily on the bank’s

own experience in determining the risk characteristics, and encouraged banks to develop and use

better risk management techniques and models (Stephanou & Mendoza, 2005).

Page 21: Feature selection in credit scoring- a quadratic

8

Credit risk measurements can be classified into classic and modern methods chronologically.

The classic credit risk measurements are more like the expert system relied mostly on human

experts’ experience to judge the probability of default. 5 Cs method is such a measurement that

credit and loan decisions are made by the judgment of experts based on five factors namely

Character, Capital, Capacity, Collateral, and Cycle conditions. Character measures the reputation

of a borrower. Capital looks at a borrower’s equity investment and debt ratio to see its financial

commitment. Capacity measures a borrower's ability to repay a loan. Collateral or third-party

guarantees are additional forms to secure the loan. Finally Cycle conditions measure how

sensitive is a borrower's sales to the overall economy. However, this method may be inconsistent

and subjective as it specifies no weighting scheme that would consistently order the 5 Cs in

terms of their relative importance in forecasting probability of default.

2.2 Credit Scoring

Due to the increasing complexity of banking activities, the qualitative method based on

human experts could no longer meet the need for credit risk management. Banks and financial

institutions were looking for more effective measurements to assist and support the complex

credit risk management. Modern credit risk measurements dominated by quantitative methods

were becoming increasingly prevalent.

Modern methods of credit risk measurement can be traced to an options-theoretic structural

approach pioneered by Merton (1974), and a reduced form approach. He proposed a model to

evaluate the credit risk of a company by considering the company’s equity as a call option on its

assets with a strike price equal to the debt repayment amount. It assumed that a company issues

zero-coupon debt that will become due at a future time. The company defaults if the market

Page 22: Feature selection in credit scoring- a quadratic

9

value of the firm’s assets falls below the value of its promised debt. Probability of default (PD) is

computed directly from the distance to default (DD) as follow in equation 2.1:

Market Value of Assets - Default PointDD=

Market Value of Assets Asset Volatility (2.1)

The higher the DD, the lower the PD. In Merton’s (1974) model, a log normal distribution

was assumed when converting the DD into a PD estimate. However, this distributional

assumption is often violated in practice. Thus some other models base on Merton’s (1974) model

used alternative approaches to project the DD into a PD estimate. For example, KMV developed

by Moody determines an empirical estimate of the PD based on a historical database of default

rates, denoted as expected default frequency (EDF). Unlike the structural approach models which

made assumptions about the dynamics of a firm’s assets, capital structure, and its debt and

shareholders, the reduced form approach models made no assumptions about why a default

occurs. Default is not tied to the dynamics of asset prices but is based on an exogenous Poisson

process. Credit Risk Plus, a model developed by Credit Suisse Financial Products (CSFP) is such

a reduced form model (Crouhy, Galai & Mark, 2000).

In the past few decades, as a form of quantitative credit risk measurement, credit scoring is

widely used as a credit evaluation process to evaluate the potential risk posed by lending money

to consumers and to mitigate losses due to bad debt. Credit scoring models reduce the cost of

credit analysis, enhance the credit decision, and save time and effort (Ong, Huang & Tzeng,

2005). Generally, the credit scoring models usually use two types of approaches which are

statistical approaches and the most recent artificial intelligent approaches. A main stream of

building credit scoring models is to develop classification models so that information from

Page 23: Feature selection in credit scoring- a quadratic

10

applications is used to separate the applicants into good and bad credit risks according to the

properties that describe demographic characteristics, economic or financial conditions of the

applicants. These models usually use statistics methods, e.g. discriminant analysis, logistic

regression, factor analysis, and probit regression, and artificial intelligent approaches, e.g. expert

system, fuzzy algorithms, genetic programming, neural networks, SVM (Falangis, 2007; Hand &

Henley, 1997; Kim & Sohn, 2004).

In order to discover the trend of credit scoring models over recent years, we used SCOPUS

database as our main source with keyword “credit scoring”. Since feature selection is core

content of this dissertation, we narrow down the search by adding AND “feature selection” OR

“subset selection” OR “variable selection”. The search results are presented in Fig. 2.

Fig. 2. Relationship between number of research papers and year

We can see the growth trend of this research area during the recent years with a big jump in

2009 after the 2008 credit crisis, and 93% of the sources are journal articles and conference

papers. It becomes clear that this area is one of the interest areas of Computer Science,

Engineering, Decision Sciences, Mathematics, Business, Management, Accounting, Social

0

5

10

15

20

25

30

1972 1982 1992 2002 2012

Nu

mb

er

of

pap

ers

Year

Page 24: Feature selection in credit scoring- a quadratic

11

Science, and Economic/Finance. Models or solution approaches used to evaluate credit scoring

and the comments on the approaches for these papers are summarized in Table 1.

Table 1

Summary of customer credit scoring models.

Model (Approach) No. of

papers

Representative

references

Comments

Ant Colony

Optimization/

Particle Swarm

Optimization

3 Marinakis,

Marinaki,

Doumpos, and

Zopounidis

(2009)

Both methods are easy to implement and very fast to converge

but often find local optimal and is difficult for theoretical

analysis.

Bayesian Network 3 Hsieh and

Hung (2010)

Bayesian network incorporate uncertainty in the model and

handle data from all sources including missing data. But it relied

on the expert input with less Spatial and temporal dynamics due

to lack of feedback loops.

Case-Based

Reasoning

(Decision Tree)

25 Cho, Hong,

and Ha (2010)

Case-based reasoning is intuitive due to the system’s self-

learning capability and easy to develop. But it will not provide

optimal solution and there is no perfect match with the case in

the system.

Fuzzy Sets 7 Zimmermann

and Zysno

(1983)

Fuzzy sets have great flexibility on variable types and data input

and can be easy to design and understand from the rules.

However, they are hard to formulate as mathematical model

Genetic Algorithm

/ Artificial

Neural Network

38 Oreski,

Oreski, and

Oreski (2012)

Both method are self-guided and self-organized with high

degree of flexibility and robustness and can be implemented

with parallelism. But both will produce chance-dependent

outcome and are computational intensive.

LDA, Logistic

Regression.

12 Altman (1968)

Yap, Ong, and

Husain (2011)

Logistic regression has great flexibility in term of variables and

relationship but requires much more data to achieve stable and

meaningful results. (Conventional approaches such as logistic

regression and LDA are mainly used as comparison purpose).

Monte Carlo

Simulation

4 Paisittanand

and Olson

(2006)

Monte Carlo Simulation has unbiased estimator and easy to be

implemented but with great deal of computation time due to

large number of simulations.

Multi-Objective

Optimization

1 Wang and

Huang (2009)

Multi-objective optimization is simple and easy to use with each

objective addressed in the model but difficult to assign weights

and combine different types of optimization into a single

formulation.

Rough set 2 Wang, Hedar,

Wang, and Ma

(2012)

They yield ‘if-then’ rules involving ordinal values to perform

classification tasks, but it can be sometimes impractical to

apply as it may lead to an empty set ; sensitive to changes in

data; and inaccurate.

Support Vector

Machine

48 Bellotti and

Crook (2009)

SVM provides unique optimal solution based on the choice of

kernel functions. However it has high algorithmic complexity

and requires extensive memory.

Page 25: Feature selection in credit scoring- a quadratic

12

A review of the studies in Table 1 reveals that conventional statistical techniques are rarely

used alone as the credit scoring models. They were used in the studies for comparison purpose

with other sophisticated methods. Among the artificial intelligent techniques, which gain their

popularity nowadays, SVM is a dominant method in credit scoring with 48 out of 168 papers

followed by neural network, genetic algorithm, decision tree, fuzzy sets, Monte Carlo simulation,

Bayesian network, ant colony, and other methods. The most frequently applied two statistical

techniques and four artificial intelligent techniques, showing in Table 1, are discussed in more

detail in the following section of this chapter.

2.2.1 Discriminant Analysis

Discriminant analysis (DA) was proposed by Fisher (1936) for a classification and

discrimination purpose where dependent variable is a nonmetric variable. It is a parametric

statistical technique used in situations in which the primary objective is to identify the group to

which an object belongs. The technique is referred to as two-group discriminant analysis when

two groups are involved while it is referred as multiple discriminant analysis (MDA) when

multiple (three or more) groups involved. DA involves deriving a variate. The discriminant

variate also known as the discriminant function is the linear combination of the independent

variables that will discriminate best between the objects in the groups, and it is achieved by

computing the variate’s weights for each independent variable to maximize the differences

between the groups with the equation 2.2:

1 1 2 2jk k k n nkZ W X W X W X L (2.2)

Page 26: Feature selection in credit scoring- a quadratic

13

where is discriminant score of discriminant function for object , is intercept, is

discriminant weight for independent variable , and is independent variable for object . In

a ward, Fisher’s discriminant analysis is to use weights , , , to construct a discriminant

function ( score) which maximize the ratio, , of the between-class variation to the within-

class variation shown as equation 2.3

2

1 2

22

Z

Z ZD

S

(2.3)

where ̅ and ̅ are sample means of (when two classes presented), is pooled estimation of

sample variances.

The discriminant functions can be used to determine to which group each case most likely

belongs, and in general we classify the case as belonging to the group for which it has the highest

discriminant score (Hair, Black, Babin & Alderson, 2010). Discriminant approach is still one

of the most broadly established techniques, and has been treated as the benchmark to other

modern classification approaches in the credit scoring applications (Aidi & Sari, 2012; Altman,

1968; Danenas, Garsva & Gudas, 2011; Glen, 2003; Jo, Han & Lee, 1997; Lam & Moy, 2002;

Swicegood & Clark, 2001).

DA is based on a number of assumptions including normality of independent variables,

linearity of relationships, lack of multicollinearity among independent variables, and

homogeneity of variance/covariance (variances among group variables are the same across levels

of predictors). A criticism on using DA is the violation of these assumptions. However, the

evidence is mixed regarding the sensitivity of discriminant analysis to violation of these

Page 27: Feature selection in credit scoring- a quadratic

14

assumptions (Eisenbeis, 1978; Hair et al., 2010; Karels & Prakash, 1987; Lacher, Coats, Sharma

& Fant, 1995).

2.2.2 Logistic Regression

Logistic regression, along with discriminant analysis, is also one of the most widely used

statistical techniques in the field. It is a form of regression that formulated to predict and explain

a binary categorical variable. When compared with DA, logistic regression is limited to

prediction of only two-group dependent measure. However, logistic regression has the advantage

of being less affected than DA when the basic assumptions, particularly normality of the

independent variables and equal variance, are not met. Since the binary dependent variable is

either 0 or 1, the predicted value (probability) must be bounded to fall within the same range. To

define a relationship bounded by 0 and 1, logistic regression uses the logistic curve to represent

the relationship between the independent and dependent variables. At very low levels of the

independent variable, the probability approaches 0, but never reaches it. Likewise, at very upper

levels of the independent variable, the probability approaches 1, but never reaches it. The

probability is computed with the following equation known as logistic function in equation 2.4:

0 1 1 2 2

0 1 1 2 21

n n

n n

X X X

X X X

eP x

e

L

L (2.4)

where is the probability of the dependent variable equaling a case, is the base of the natural

logarithm (about 2.718), and s are the parameters of the model. A graph of the logistic function

P is shown in Fig. 3.

The parameters are usually estimated with maximum likelihood estimation under an

assumption that observations are independent by the following equation 2.5

Page 28: Feature selection in credit scoring- a quadratic

15

1

1

1i

i

yny

i i

i

l x x

(2.5)

where is unknown parameters, and and are observation ( ). The logistic function

takes an input with any value from negative infinity to positive infinity, whereas the output is

confined to values between 0 and 1, and hence is interpretable as a probability of the dependent

variable equaling a case. Logistic regression has been widely used in credit scoring applications

(Lee & Jung, 1999; Martin, 1977; Nie, Rowe, Zhang, Tian & Shi, 2011; Salehi & Mansoury,

2011; Srinivasan & Kim, 1987; Ye, Li, Feng & Wang, 2011). Although the logistic regression

can perform well in many applications, the accuracy of logistic regression model decreases when

the relationships between variables are non-linear (Akkoç, 2012).

Fig. 3. Logistic function P

2.2.3 Decision Trees

Decision trees are well-known classification techniques that organize information extracted

from a training dataset in a tree structure composed of the internal nodes, the leaf nodes, and

0.5

1

0

Page 29: Feature selection in credit scoring- a quadratic

16

branches. The internal nodes represent the input attributes, the leaf nodes represent the

classification, and branches represent conjunctions of attributions. The algorithm begins with

selecting an attribute to place at the root node. Based on the impurity measure, the algorithm then

loops over all possible splits in order to find an attribute ( ) and its corresponding cutoff ( )

value which gives the best split condition. This process is repeated recursively for the new nodes

until a stopping criterion is satisfied. Fig. 4 gives an example of a decision tree in terms of two

classes of customers with bad credit and good credit.

Fig. 4. An example of a decision tree

Impurity measure defines how well the two classes are separated. The leaf nodes are

classified according to the most prevalent class in them. There are various impurity measures

used in the literature such as entropy based measure in ID3 (Quinlan, 1986) and its successor

C4.5 (C5.0 as the latest version) (Quinlan, 1993) computed as equation 2.6, and Gini measure in

Root Node

Bad credit

Good credit

Good credit Bad credit

Page 30: Feature selection in credit scoring- a quadratic

17

classification and regression trees (CARTs) (Breiman, Friedman, Olshen & Stone, 1984; Loh,

2011) computed as equation 2.7.

1

 k

i i

i

I D Entropy D p log p

(2.6)

2

1

1k

i

i

I D Gini D p

(2.7)

Decision tree models are powerful and flexible classifiers. Their popularity attributes to the

easy interpretation and implementation of the results. Decision trees have been successfully used

in many classification problems, and have been applied to the development of credit scoring

applications (Frydman, Altman & Kao, 1985; Mandala, Nawangpalupi & Praktikto, 2012; Nie et

al., 2011; Paleologo, Elisseeff & Antonini, 2010; Yi, Yan, Zhimin & Xiangjian, 2008; Zhang,

Zhou, Leung & Zheng, 2010; Zibanezhad, Foroghi & Monadjemi, 2011). However, one of the

limitations of decision trees is their instability, because even small fluctuations in the sample

data may lead to large variations in the classifications assigned to the instances (Li & Belford,

2002).

2.2.4 Neural Networks

Neural networks (NNs) are mathematical techniques developed by simulating working

principles of the human brain. NNs are structures of highly interconnected artificial nodes called

neurons or computational unit to form a network which mimics a biological neural network. Fig.

5 gives an example of a neuron.

Page 31: Feature selection in credit scoring- a quadratic

18

A neuron has a set of input connections that receive signals from other neurons, a set of

weights, , for each input connection, , and a transfer function, ( ), that transforms the

sum of the weighted inputs to output (Coakley & Brown, 2000).

Fig. 5. An example of a neuron

There are different types of NNs such as feedforward neural network, recurrent neural

network, and self-organizing network. Feedforward neural network is the most widely used

Fig. 6. An example of a neural network

( )

Neuron

Input Layer Hidden Layer Output Layer

Page 32: Feature selection in credit scoring- a quadratic

19

technique as shown in Fig. 6, and two most popular feedforward neural networks models are the

multi-layer perceptron (MLP) and the Radial Basis Function (RBF) networks (Wlodzislaw &

Norbert, 2001). They have much more in common, and the only fundamental difference is the

way in which hidden units combine values coming from preceding layers. MLPs use inner

products, while RBFs use Euclidean distance.

A major advantage of neural networks is their ability to provide flexible mapping between

inputs and outputs. This is achieved by adding a hidden layer between input layer and output.

The arrangement of the simple units into a multilayer framework produces a map between inputs

and outputs that is consistent with any underlying functional relationship regardless of its “true”

functional form. Having a general map between the input and output vectors eliminates the need

for unjustified priori restrictions that are needed in conventional statistical and econometric

modeling. Therefore, a neural network is often considered as a “universal approximator (Yu,

Wang & Lai, 2007, p. 28). Cybenko (1989) and Hornik, Stinchcombe, and White (1989)

demonstrated that arbitrary decision regions can be arbitrarily well approximated by continuous

feedforward neural networks with only a single hidden layer and any continuous sigmoidal

nonlinearity.

Back Propagation (BP) algorithm is used to train the feedforward neural network. It is a

supervised learning method, and the algorithm trains a given feedforward multilayer neural

network for a given set of input patterns with known classifications. When each entry of the

sample set is presented to the network, the network examines its output response to the sample

input pattern. The output response is then compared to the known and desired output and the

error value is calculated. Based on the error, the connection weights are adjusted.

Page 33: Feature selection in credit scoring- a quadratic

20

Multilayer feed‐forward neural networks have been applying to many credit scoring models

(Derelioğlu & Gürgen, 2011; Derelioğlu, Gürgen & Okay, 2009; Dimla & Lister, 2000; Nanni &

Lumini, 2009; Tsai, 2009). However, West (2000) investigated the performance of five different

neural networks in credit scoring problem. The results showed that the mixture‐of‐experts and

radial basis function neural networks performed better, whilst multilayer perceptron (MLP) may

not be the most accurate neural network model. Other types of neural networks were developed

as well. For example, Piramuthu (1999) used neurofuzzy systems to evaluate credit risk. Ravi

and Pramodh (2008) proposed a principal component neural network (PCNN) architecture to

predict bankruptcy. Some hybrid neural networks were also developed to combine neural

network with subset selection models when dealing with large number of variables (Lee & Chen,

2005; Lee, Chiu, Lu & Chen, 2002; Yim & Mitchell, 2005) Meanwhile, comparisons between

neural networks and traditional statistical approaches have been widely studied (Alam, Booth,

Lee & Thordarson, 2000; Bell, 1997; Jo et al., 1997; Lin, Chang, Li & Chao, 2011; Malhotra &

Malhotra, 2003; Zhang, Hu, Patuwo & Indro, 1999). The majority of these studies reported that

the neural network models have better performance in terms of predictive accuracy rate when

compared with other traditional techniques, such as discriminant analysis and logistic regression,

though the results were very close (Abdou & Pointon, 2011; Crook, Edelman & Thomas, 2007).

2.2.5 Genetic Programming

Genetic programming (GP) was suggested by (Koza, 1992). It is an evolutionary algorithm

inspired by the Darwinian theory of evolution, and can be viewed as a specialization of genetic

algorithms (GA) (Eiben & Smith, 2003). GP can handle more complicated structures in

optimization when comparing with GA and therefore has been widely applied to a great diversity

of problems (Espejo, Ventura & Herrera, 2010; Sette & Boullart, 2001; Zhang & Bhattacharyya,

Page 34: Feature selection in credit scoring- a quadratic

21

2004). GP evolves computer programs, and is represented in memory as tree structures

composed of the function set and terminal set. The function set is the operators or statements

such as arithmetic operators or If, then conditional statements. The terminal set

contains constants, input and other zero-argument in the GP tree. For example, ( )

( ) is expressed as Fig. 7.

Fig. 7. An example of expression of GP

Once a population of rules representing potential solutions to the classification of the GP tree

is initialized, the following procedures are similar to GA. The initial population of rules is

evaluated with a fitness function, and some of these rules are selected to run the mechanism of

Fig. 8. An example of mutation in GP

+

*

X Y

-

8 /

Z 3

Mutation

+

*

X Y

-

8 /

Z 3

+

*

X Y

-

3 *

A B

MC MP

Page 35: Feature selection in credit scoring- a quadratic

22

Fig. 9. An example of crossover in GP

reproduction. Genetic operators, mutation and crossover, are then applied to produce new rules.

The mutation operator is used to choose a node randomly in a subtree and replace it with a new

subtree randomly as shown in Fig. 8 from MP to MC. The crossover operator is used to swap the

subtree from the parents to reproduce the children as shown in Fig. 9 from CP1 and CP2 to CC1

and CC2. These procedures are repeated until an acceptable classification rule is found for each

class in the dataset (Etemadi, Anvary Rostamy & Dehkordi, 2009; Ong et al., 2005).

Genetic programming is a rapidly growing area, and one of the most recent techniques that

has been applied in the classification problems. There are numbers of studies applied GP in the

field of credit scoring (Abdou, 2009; Alfaro-Cid, Sharman & Esparcia-Alcazar, 2007; Chen,

Zhang, Wei & Chen, 2007; Huang, Tzeng & Ong, 2006; Jiang & Yuan, 2007; Lensberg, Eilifsen

& McKee, 2006; Liu, Wang & Shuai, 2008; Rampone, Frattolillo & Landolfi, 2013; Zhang, Hifi,

Chen & Ye, 2008).

CC2

Crossover

-

+

A B

/

X *

Y Z

+

*

X Y

-

8 /

Z 3

+

*

X Y

/

X *

Y Z

-

+

A B

-

8 /

Z 3

CP1

CP2

CC1

Page 36: Feature selection in credit scoring- a quadratic

23

2.2.6 Support Vector Machines

Support Vector Machines (SVMs) are supervised machine learning method suggested by

Vapnik (1995). It produces a binary classifier, so-called optimal separating hyper planes, through

an extremely non-linear mapping of the input vectors into the high-dimensional feature space.

SVM constructs a linear model to estimate the decision function using non-linear class

boundaries based on support vectors. If the data are linearly separated, SVM trains linear

machines for an optimal hyper plane that separates the data without error and into the maximum

distance between the hyper plane and the closest training points. The training points that are

closest to the optimal separating hyper plane are called support vectors.

Specifically, the main idea of SVM to map the linear inseparable samples in low dimensional

space to linear separable samples in high dimensional feature space with some kernel functions

making it easy to analyze the nonlinear characteristics of samples through linear algorithm in the

high dimensional space. There are several types of kernel functions including polynomial, radial

basis functions and sigmoid kernels which can be expressed as follow (Prajapati & Patle, 2010):

1. Polynomial kernel: , 1q

T

i iK x x x x

(2.8)

2. Radial basis functions (RBF) kernel: 2

2, exp{ }

i

i

x xK x x

(2.9)

3. Sigmoid kernel: , tanh( ( ) )T

i iK x x v x x c (2.10)

where RBF and sigmoid kernels are usually applied in classification problem and regression

analysis respectively.

Page 37: Feature selection in credit scoring- a quadratic

24

After changing the samples in low dimensional space to samples in high dimensional feature

space with one of these kernel functions, the best separating hyper plane that maximizes the

distance between the two classes (or minimizes number of training errors) is constructed.

Generally, the algorithm builds two parallel hyper planes, H1 and H2, in a way that they

separate the data with no points between them, and then try to maximize their distance. The

region bounded by them is called "the margin". The best separating hyper plane also known as

optimal hyper plane H falls in between the two split planes making the distance of the plane

H1(H2) and the plane H to be as large as possible, see Fig. 10.

Fig. 10. An example of a SVM in the two-dimensional space

Given training vectors in two classes, labeled by the vector { }.

The support vector machine finds an optimal hyper plane with the maximum margin by solving

the following optimization problem

Optimal hyper plane

margin margin

H1

H

H2

Page 38: Feature selection in credit scoring- a quadratic

25

, ,1

1Min

2

subject to: -1 0

0

mT

iw b

i

i i i

i

w w C

y w x b

(2.11)

This powerful tool for classification has been widely applied in practical problems such as

credit scoring (Bellotti & Crook, 2009; Chen & Li, 2010; Danenas et al., 2011; Harikrishna,

Farquad & Shabana, 2012; Huang et al., 2007; Kim & Sohn, 2010; Li, Li, Kuo, Liu & Huang,

2012; Martens, Baesens, Van Gestel & Vanthienen, 2007; Schebesch & Stecking, 2005; Wang,

Guo & Wang, 2010; Wei, Li & Chen, 2007), financial time-series forecasting (Tay & Cao, 2001;

Van Gestel, Suykens, Baestaens, Lambrechts, Lanckriet, Vandaele, De Moor & Vandewalle,

2001), pattern recognition (Asada, Yun, Nakayama & Tanino, 2004; Camastra, 2007), and

disease diagnosis (Akay, 2009; Huang, Liao & Chen, 2008; Lu, Van Gestel, Suykens, Van

Huffel, Vergote & Timmerman, 2003; Su & Yang, 2008).

2.3 Feature Selection

Due to the rapid expansion of credit industry and availability of massive amounts of

information, one of the challenges researchers have to face when using classification algorithms

to build credit scoring models is the selection of features since on one hand, increasing the

number of features increases collinearity and causes greater variance on the prediction of

response variable. On the other hand, the inclusion of high dimensional data with many

irrelevant and redundant features leads to high complexity, intensive computation, instability, or

lack of predictive accuracy for most classification models. Therefore, in the past few years, most

credit scoring models involve feature selection in order to reduce the computational effort for

classifiers and improve the accuracy of credit scoring models (Chen & Li, 2010).

Page 39: Feature selection in credit scoring- a quadratic

26

Feature selection is a problem of finding an optimal subset which is given by a feature subset

selection algorithm that provides the highest possible accuracy. The following definition is given

by Kohavi and John (1997): “Given an inducer and a dataset with features

from a distribution D over the labeled instance space, an optimal feature subset, , is a subset

of the features such that the accuracy of the induced classifier = () is maximal (p. 276).

An optimal feature subset is not necessarily unique because when one feature can be replaced

by another feature since they are perfectly correlated to each other, the accuracy derived by

different combination of features is the same. Reasons for using a subset of variables can be

summarized as: (1) improving the prediction performance by eliminating uninformative

variables; (2) providing faster and cost-effective features and thus saves cost of collecting data

and builds models parsimonious; (3) providing better understanding of underlying process

generated the data, making the model more interpretable (Guyon & Elisseeff, 2003; Miller,

1984).

Fig. 11. Flowchart of filter approaches

Feature selection algorithms can be classified into two categories, namely, filter algorithms

and wrapper algorithms. Filter algorithms are independent of any learning algorithms and use

Input

Features

Training data Testing data

Testing

Accuracy

Particular

Measures

The Selected

Subset

Induction

Algorithm

Testing

Page 40: Feature selection in credit scoring- a quadratic

27

particular measures, such as distance measures, information measures, dependency measures,

and consistency measures to evaluated features. The flowchart of filter approach is shown as Fig.

11.

Distance measures are also known as separability, divergence, or discrimination measures.

Euclidean distance and Chebyshev distance are examples for distance measures. The relief

algorithm proposed by Kira and Rendell (1992) was based on the distance measures. The basic

idea of relief is to draw instances at random, compute their nearest neighbors, and adjust a

feature weighting vector to give more weight to features that discriminate the instance from

neighbors of different classes. Therefore, a useful feature should have the same value for cases

from the same class and different values between cases from different classes. However, the

relief algorithm does not detect redundancy, so the remaining subset still contains redundant

features due to its feature evaluation mechanism that all discriminative features are assigned with

high relevance weight without considering the correlations in between (Yang & Li, 2006).

Mutual information has been used for feature selection as an information measure. Battiti

(1994) introduced mutual information feature selection (MIFS) to investigate the application of

the mutual information criterion to evaluate a set of candidate features and to select an

informative subset with robust estimation. The fast correlation-based filter (FCBF) developed by

Senliol, Gulgezen, Lei, and Cataltepe (2008) is a sequential forward selection algorithm that

creates the feature subset by sequentially adding features in decreasing relevance order while

excluding redundant features. Yu and Liu (2004) defined feature redundancy and proposed to

perform explicit redundancy analysis in feature selection. They developed a correlation-based

method for relevance and redundancy analysis. Peng, Fulmi, and Ding (2005) studied how to

select good features according to the maximal statistical dependency criterion based on mutual

Page 41: Feature selection in credit scoring- a quadratic

28

information. They derived an equivalent form, called minimal-redundancy-maximal-relevance

criterion (mRMR), for first-order incremental feature selection. Fleuret (2004) proposed a fast

feature selection technique based on conditional mutual information. The method ensures the

selection of features, which are both individually informative and two-by-two weakly dependant

by picking features that maximize their mutual information. Kwak and Choi (2002) proposed a

new method of calculating mutual information between input and class variables based on the

Parzen window. However, it is said that in many areas of experimental sciences, it is difficult to

compute mutual information accurately due to the limited or imbalanced sample size (Sakar &

Kursun, 2012).

Dependency measures based on statistical information, such as Pearson correlation

coefficients, Fisher score, t-test, F-Statistic, etc., are designed to quantify how strongly two

features are associated or correlated with each other. Many traditional selection methodologies

are based on these measures, such as forward selection, backward elimination, and stepwise

regression. Wei and Billings (2007) presented a new unsupervised forward orthogonal search

(FOS) algorithm for feature selection and ranking. In this algorithm, features are selected in a

stepwise way, one at a time, by estimating the capability of each specified candidate feature

subset to represent the overall features in the measurement space. A squared correlation function

is employed as the criterion to measure the dependency between features, and this makes the new

algorithm easy to implement. Camps, Mooij, and Scholkopf (2010) introduced a nonlinear

measure of independence between random variables for remote sensing supervised feature

selection, where statistical dependence is evaluated with Hilbert–Schmidt independence criterion

(HSIC).

Page 42: Feature selection in credit scoring- a quadratic

29

Finally, consistency measures try to retain the discriminating power of the data defined by

original features. Dash and Liu (2003) carried out a study of consistency measure with different

search strategies. The study of the consistency measure with other measures shows that it is

monotonic, fast, multivariate, capable of handling some noise, and can be used to remove

redundant and/or irrelevant features.

In short, Filter methods assess the relevance of features by looking at the intrinsic properties

of the data. They are fast and independent of the classifier, and thus can easily scale to very high

dimensional datasets. As a result, feature selection need to be done only once and then different

classifiers can be evaluated. However, these methods totally ignore the effects of the selected

feature subset on the performance of the induction algorithm. Many filters provide a feature

Fig. 12. Flowchart of wrapper approaches

No

Yes

Testing

Accuracy

Training data Testing data

The Selected

Subset

Induction

Algorithm

Testing

Accuracy or

Fitness

Evaluation

Input

Features Stop

Feature Selection

Search

Induction

Algorithm

Page 43: Feature selection in credit scoring- a quadratic

30

ranking rather than an explicit best feature subset. This may lead to worse classification

performance when compared to other types of feature selection techniques. In addition, it is not

clear how to determine the threshold point for rankings to select only the required features and

exclude noise (Kumari & Swarnkar, 2011).

Wrappers use the learning machine of interest as a black box to score subsets of variables

according to their predictive power. The search algorithm for wrappers searches through the

space of possible features and evaluates each subset. The flowchart of wrapper approach is

shown as Fig. 12. The induction algorithm can be the classifier in the problem of classification.

Hsu (2004) employed decision tree for feature selection, and found out a subset of features

with lowest error rate of classification by using the genetic algorithms wrapper approach for

inducing decision trees. Chiang and Pell (2004) incorporated genetic algorithms with Fisher

discriminant analysis (FDA) for key variable identification, and genetic algorithms are used as an

optimization tool to determine variables that maximize the FDA classification success rate for

two given data sets. Guyon, Weston, Barnhill, and Vapnik (2002) proposed a method of gene

selection utilizing SVM methods based on Recursive Feature Elimination (RFE) yielded better

classification performance. Chen and Li (2010) combined SVM classifier with conventional

statistical linear discriminate analysis, decision tree, rough sets, and F-score approaches as

features selection. Chen, Ma, and Ma (2009) proposed hybrid SVM technique based on three

strategies, namely, using classification and regression tree (CART) to select input features, using

multivariate adaptive regression splines (MARS) to select input features, and using grid search to

optimize model parameters. Their results demonstrated that the hybrid SVM provided the best

classification rate. It is seemed that in the past few years these hybrid models yielded better

performance, and gained their popularity in research.

Page 44: Feature selection in credit scoring- a quadratic

31

Wrapper algorithms often achieve better results than filters in that they are tuned to the

specific interaction between an induction algorithm and their training data. However, when a

search algorithm is wrapped around the classification model, the space of feature subsets grows

exponentially with the number of features. This problem is known as NP-hard, and the search

quickly becomes computationally expensive and intractable (Kumari & Swarnkar, 2011; Liu &

Schumann, 2005). Wrapper algorithms also have a risk of over fitting to the model.

In sum, there are advantages and disadvantages for both filter and wrapper methods. This

paper presents a hybrid model by combining advantages of filter and wrapper methods. In this

study, subsets for different sizes are selected with a binary quadratic programming model based

on correlation coefficient. The first phase dramatically brings down the number of subsets to be

evaluated by the classifiers from all possible subsets. These selected subsets are then follow

wrapper approach in which one or more satisfactory subsets of features will be given based on

the accuracy of the prediction and the size of the subset.

Page 45: Feature selection in credit scoring- a quadratic

32

CHAPTER III

METHODOLOGY

The model used to select subsets of variables in this study is a correlation coefficient based

binary quadratic programming model. The aim is to select optimal subsets of variables for

different sizes of subsets. The model is transformed into the unconstrained binary quadratic

programming (UBQP) problem and solved with bisection method based on Tabu search

algorithm (BMTS). For variables where subsets of variables selected from, there will be

( ) possible subsets of variables corresponding to the subset of size , where .

The approach for selecting subsets in this paper, in most cases, will choose at least subsets of

variables for different sizes out of all possible subsets. The selected subsets of variables

associated to the optimal solution of the presented subset selection model are evaluated in terms

of OCAR with 10-fold cross validation SVM. Finally, satisfactory subsets used as a credit

scoring (classification) model are determined based on both the overall classification accuracy

rate (OCAR) and the size of the subset. It should be noted that there can be multiple satisfactory

subsets for a same size due to different value of parameter α.

3.1 Model Construction

The criteria used to build the subset selection model involve two conflicting objectives. On

one hand, we would like the model to include as many variables as possible so that the

information content in these factors can influence the predicted value. On the other hand, we

want the model to include as few variable as possible because the variance of the predicted value

increases as collinearity increases caused by the increase of the number of variables. Therefore,

in a good model, the correlations between variable and the variables ( - correlations)

should be high, and those between the variables ( - correlations) should be low (Eksioglu,

Page 46: Feature selection in credit scoring- a quadratic

33

Demirer & Capar, 2005). Finally, the objective function is to maximize the difference between

the sum of the - correlations and - correlations adjusted by a weight α since there is a

tradeoff between the number of informative factors and the effect of collinearity or in other

wards one must trade off estimation of more parameters (bias reduction) with accurately

estimating these parameters (variance reduction).

Let

be the sample correlation coefficient between and , and (

) be the sample

correlation coefficient between ( ) and ( ). The subset selection method derives a subset

containing variables from the set with all variables such that the sum of correlation

coefficient between and , ∑ | |

, is maximized, and the sum of correlation coefficient

between ( ) and ( ) , ∑ (| | |

|)

, is minimized. A combination of these

correlation terms is generated using a weight, α ( α ), denoting the tradeoff between the

two conflicting objectives. This is shown as objective function in model 3.1. and , are

defined as the decision variables, where if variable is in , 0 otherwise. if

variable is in , 0 otherwise. Therefore indicates both and are in , 0 otherwise.

1

i i j

1 1, ,

i j

Maximize u 1 u u

s.t.

u ,u 0,1

n n

yi ij ji

i ii N i j N i j

(3.1)

Page 47: Feature selection in credit scoring- a quadratic

34

3.2 Algorithm

Model 3.1 can be rewritten as form of , where is an vector of binary variables and

is an symmetric matrix. When we expand , we have following equation 3.2.

1

1 1

'n n

ii i j ji i j

i ii j

u Qu q u qi q u u

(3.2)

where and are binary variables. are the elements on the diagonal of matrix, and and

are symmetric non-diagonal elements of matrix. As we can see model 3.1 exactly matches

with the right hand side of Equation 3.2, and thus model 3.1 can be rewritten to the form of

which is recognized as UBQP problem with α | | as its diagonal elements and – (

α) (| | |

|) as its non-diagonal elements of matrix.

1 12 21 1 11 1

2 21 12 2 2 2 2

1 1 2 2

1 1

1 1

1 1

T y n n

y n n

n nn n n n yn

u u

u u

u u

L

L

M MM M O M

L

According to Kochenberger and Glover (2006), the UBQP is used to solve a wide variety of

combinatorial optimization problems. Since the proposed model for feature selection in this

study is already quadratic function without constraints, no additional transformation is needed in

this case. However, any linear or quadratic discrete problem with linear constraints in bounded

Page 48: Feature selection in credit scoring- a quadratic

35

integer variables can be converted to the form of UQBP by using quadratic infeasibility penalties

as an alternative to imposing constraints. Following is a brief introduction for general constraints

problem:

0 '

s.t. binary

Min x x Qx

Ax b x (3.3)

The constrained quadratic optimization model can be converted into equivalent UQBP models by

imposing a quadratic infeasibility penalty function to the objective function, and thus model 3.3

is converted to

0

^

Min ( ) ( )

=

=

tx xQx P Ax b Ax b

xQx xDx c

xQ x c

(3.4)

where is a positive scalar. The additive constant can be dropped later on to attain the final

unconstrained version, model 3.5, of the constrained model 3.3. Slack variables can be added to

inequality constraints to comply with the form of .

ˆMin ' x Qx x binary (3.5)

On the other hand, the equivalent quadratic penalties for certain types of constraints are well

established, making the conversion from constrained to unconstrained model much easier. For

Page 49: Feature selection in credit scoring- a quadratic

36

example, the corresponding quadratic penalties for constraints is ( ), and Table 2

gives some other well established penalties.

Table 2

Penalty conversion.

Classical Constraint Equivalent Penalty

( )

( )

( )

( )

( )

A variety of procedures to solve UBQP problem have been reported. For example,

exhaustive method based on branch and bound is useful to find out optimal solutions for

problems of small or limited size (Boros, Hammer, Sun & Tavares, 2008). For problems of large

size, Tabu search based algorithms are among the most successful ones in solving UBQP

problems (Wang, Lü, Glover & Hao, 2012) . Tabu search algorithm provides solutions very close

to optimality and are among the most effective, if not the best, at tackling the difficult problems

at hand since classical methods often encounter great difficulty when facing the challenge of

solving hard optimization problem. These successes have made Tabu search extremely popular

among those interested in finding good solutions to large combinatorial problems. A

distinguishing feature of Tabu search is that it is based on the premise that problem solving must

incorporate adaptive memory and responsive exploration, allowing local search methods to

overcome local optima and enhance the performance by using memory structures that describe

the visited solutions or user-provided sets of rules.

Page 50: Feature selection in credit scoring- a quadratic

37

In more details, let X be a feasible solution set. Each has an associated neighborhood

( ) , and each solution ( ) is reached from by an operation called a move. Tabu

search is an iterative method and begins in the same way as ordinary local or neighborhood

search. The general steps of an iterative procedure start with (1) choosing , and then (2)

find ( ) such that ( ) ( ), (3) is the local optimum (minimal), denoted with , if

no such can be found and the method stops, (4) otherwise, designate to be the new and go

to step (2) (Glover & Laguna, 1997). However, algorithms for ordinary local or neighborhood

search often face the risk of being trapped in local optimal instead of global optimal. Tabu

neighborhood search method has been designed to avoid being trapped in a local minimum or

maximum by using memory structure which can be considered as modifying the neighborhood

( ) of the current solution . The modified neighborhood denoted by ( ) may be expanded

to include solutions not ordinarily found in ( ). Therefore, Tabu search can be viewed as a

dynamic neighborhood method where the neighborhood of is not a static set, but rather a set

that can change according to the history of the search (Glover, 1986, 1989, 1990; Glover &

Laguna, 1997). The dynamic neighborhood is achieved via the memory strategy. The selected

attributes that occur in solutions recently visited are labeled ‘Tabu-active’, and solutions that

contain Tabu-active elements are those become Tabu. Current moves are taken in Tabu list

which record the Tabu-active attributes and identify their current status. These Tabu-active

attributes are prohibited in the following number of iteratives that defined as Tabu tenure. This

prevents certain solutions from the recent past from belonging to ( ) and hence from being

revisited, and consequently exploiting solutions in a modified neighborhood.

The first adaptive memory Tabu search algorithm was used to solve UBQP by Glover,

Kochenberger, and Alidaee (1998), and then more Tabu search strategies and algorithm

Page 51: Feature selection in credit scoring- a quadratic

38

improvements have been presented. Palubeckis (2004) proposed five different multistart Tabu

search strategies for the large size of unconstrained binary quadratic optimization problem and

has achieved very good results. Palubeckis (2006) later on further improves these results by

iterated Tabu search algorithm. More recently, Glover, Lü, and Hao (2010) presented a

diversification-driven Tabu search (D2TS) algorithm that alternates between a basic Tabu search

procedure and a memory-based perturbation strategy guided by a long-term memory. Lü, Glover,

and Hao (2010) proposed a hybrid metaheuristic approach (HMA) by incorporating a Tabu

search procedure into the framework of evolutionary algorithms.

Finally, Wang, Lü et al. (2012) proposed an improved algorithm called path relinking which

are composed of a reference set initialization method, an improvement method by Tabu search, a

reference set update method, a relinking method and a path solution selection method. It has

been demonstrated that the algorithm improves both solution quality and computational

efficiency. Therefore, the problem in our paper is then solved iteratively for different

values of α with this path relinking algorithm, an improved Tabu search algorithm. It has been

demonstrated that the algorithm improves both the quality of solution and the efficiency of

computation. Therefore, the problem in our paper is then solved iteratively for different

values of α with this path relinking algorithm. In this study, three parameters need to be set in

path relinking algorithm. The first one is the time limit for the whole path relinking algorithm to

be stopped. For the current problem, 1 second is sufficient. The second parameter is the stop

condition of each Tabu search procedure in the path relinking algorithm. We choose 5000, which

means the Tabu search will stop when the current best known result has not been improved

within the last 5000 iterations. The last parameter is for the Tabu tenure, which is the number of

Tabu-active attributes that are prohibited to be visited in the following iterations. Considering the

Page 52: Feature selection in credit scoring- a quadratic

39

size of problems used in this study, 5 to 7 is a reasonable range for this parameter. The bisection

method and its connection with path relinking algorithm are realized by Python, and 10-fold

cross validation is run in MATLAB.

As mentioned earlier, α is the weight to balance between the number of informative factors

and the degree of collinearity. When α , the first term, α∑ | | , in the objective

function is zero, and the objective function becomes to maximize ( α) ∑ (| |

| |) . The maximal solution for this objective function is 0 which means all

assuming . According to the definition for , no more than one variable can be

selected to form the subset model. In this case the selected subset of variable with size 1 should

be the variable having the largest correlation coefficient with response . On the other extreme,

when α , the second term, ( α) ∑ (| | |

|) , in the objective function is

zero, and the objective function is to maximize ∑ | | assuming

. Apparently, the

maximal can be obtained by setting all which indicates to select all variables. It is found

that the number of variables selected by the model changes from 1 to as α increases from 0 to 1

in this study.

We present a bisection method to identify αs that gives different sizes of optimal subsets of

variables. α [ ] is divided into equally. The number of αs gives solutions with

many of them duplicated since a certain range of α gives the same solution according to our

experiments. Thus these duplicated solutions can be eliminated (see Appendix A for examples of

solutions for Model 3.1 when α [ ] is divided into ). When is extremely large, we can

use parallel computing. α [ ] is first divided into

where each of the interval can be then

Page 53: Feature selection in credit scoring- a quadratic

40

partitioned into where is smaller than , and also each of the interval can be run

simultaneously. For example, if we divided into where (with interval equals

) and set the time limit to 1 second for Tabu search algorithm, then 32768 seconds

(almost four days) is needed to reach the solutions for the UBQP problem. However, if we first

divided into

where , then partition the 10 intervals of 0.1 into where (with

interval equals , and run them at the same time, it will only take 4096 seconds which

is approximate 11 hours with an even smaller interval. In most cases, at least one solution for

each size of subset is retained and these solutions will be evaluated by 10-fold cross validation

SVM to find out the satisfactory subsets. Fig. 13 gives a flowchart of the proposed BMTS+SVM

method.

Fig. 13. Flowchart of the BMTS+SVM method

BMTS

Accuracy

Evaluation Testing

Accuracy

Training data Testing data

Classifier

(SVM)

Classifier

Testing

Input

Feature

s Binary Quadratic

Programming Model

Subsets with

Different Sizes

Satisfactory

Subset(s)

Filter

Wrapper

Page 54: Feature selection in credit scoring- a quadratic

41

The two stages for solving the current problem shown as follow:

Stage 1: Divided Alpha into equally and for each Alpha, do the following

Step 1. Solve model 3.1 with Tabu search algorithm (path relinking).

Step 2. If th solution = ( )th solution, replace ( )th solution with th solution,

otherwise keep both solutions.

Step 3. Evaluate the features of subsets corresponding to the solutions with 10-fold cross

validation SVM.

Stage 2: Choose the satisfactory subsets based on both the size of the subset and the OCAR.

3.3 SVM Classifier

There are a number of classification techniques. Among conventional statistical methods,

logistic regression and discriminant analysis are most widely used (Baesens, Gestel, Viaene,

Stepanova, Suykens & Vanthienen, 2003; Šušteršič et al., 2009). However, due to the possible

complex nonlinear relationship between variables, they are reported to have a lack of accuracy.

There are also more sophisticated methods known as artificial intelligence such as fuzzy systems,

neural networks, genetic Programming (GP) and SVM that achieve better performance than

traditional statistical methods in classification task (Baesens et al., 2003; Desai, Crook &

Overstreet, 1996; Lee & Chen, 2005; Lee et al., 2002; West, 2000). Recently, SVM has received

considerable attention in the machine learning literature, and it is appreciated because of its

strong theoretical foundation adaptive generalization ability, and appealing and stable predictive

performance (Lessmann & Voß, 2009). In addition, compared with other artificial intelligence

techniques, only two free parameters, namely penalty parameter C and the kernel function

parameters such as the gamma ( ) for the radial basis function (RBF) kernel, are needed for

SVM and it guarantees unique, optimal and global solution since the training of an SVM is done

by solving a linearly constrained quadratic problem (Shin, Lee & Kim, 2005). Due to these

advantages and its popularity, SVM is used as the classifier in this study.

Page 55: Feature selection in credit scoring- a quadratic

42

3.4 Cross Validation

Cross-validation is a model validation technique in estimating how well the model,

developed on training data, will perform on a future unknown data set. In -fold cross-validation,

the original data set is randomly partitioned into equally sized subsets of samples (folds).

Subsequently iterations of training and validation are performed such that within each iteration

a different fold of the data is retained as the validation data for testing the model while the

remaining folds are used as training data. The results from the folds can then be

averaged to produce a single estimation.

Fig. 14 gives an example of -fold with . The original data set is randomly divided into

five equally sized subsets, to . The estimation of parameters for the model, in each

experiment, is based on the four unshaded training datasets, while predictive performance such

as classification accuracy is obtained from the shaded validating dataset. Finally, the overall

classification accuracy is computed with the average accuracy rate from the five experiments.

Fig. 14. An example of the cross validation

Experiment 3

Experiment 4

Experiment 5

Experiment 2

Experiment 1

S1 S4 S3 S2 S5

Page 56: Feature selection in credit scoring- a quadratic

43

We use 10-fold cross-validation along with the SVM in this study. In data mining and

machine learning, 10-fold cross-validation is the most common, and it was found by Kohavi

(1995) that 10-fold cross validation was among the best model selection method since it

provided less biased estimation of the accuracy after compared with several other approaches to

estimate accuracy.

Page 57: Feature selection in credit scoring- a quadratic

44

CHAPTER IV

EXPERIMENT RESULTS AND COMPARISON ANALYSIS

4.1 Validation of the Method on Two Benchmark Datasets

The real world datasets, the Australian and German credit datasets, are used to test the

validity of the hybrid model. Both datasets are available from the UCI Repository of Machine

Learning Databases. It consists of 307 ‘good’ applicants and 383 ‘bad’ applicants whose credits

are not creditworthy in the Australian dataset. Each applicant contain a class attributes and 14

features, including 6 nominal and 8 numeric attributes. Despite the fact that all attribute names

and values in this dataset have been changed to symbols to protect confidentiality of the data,

this dataset is interesting and valid for the model testing purpose because there is a good mix of

attributes including continuous, nominal with small numbers of values, and nominal with larger

numbers of values. The original German dataset contain a class attribute and 20

categorical/symbolic attributes including status of existing account, credit history, duration in

month, purpose, credit amount, savings account/bonds, present employment, personal status and

sex, installment rate in percentage of disposable income, other debtors/guarantors, present

residence, property, age in years, other installments plans, housing, number of existing credits at

this bank, job, dependents, telephone, and foreign worker. It contains 700 instances of

creditworthy applicants and 300 bad applicants. An edited German dataset, used in this study, is

also available at UCI machine learning database with 24 numerical attributes. Several indicator

variables have been added, and categorical attributes have been coded as integer. Description

about the datasets is shown in Table 3. These two datasets have been studied by many

researchers and the Australian dataset, especially, contains a good mixture of attributes making it

interesting for research purpose.

Page 58: Feature selection in credit scoring- a quadratic

45

Table 3

Statistic description for Australian and German datasets.

Country No. of

Attributes

Nominal

features

Numeric

features

No. of

classes

Sample

size

Good

credit

Bad

credit

Australia 14 6 8 2 690 307 383

German 24 0 24 2 1000 700 300

The first step for the credit scoring problem in this study is to establish subsets of features

with full dataset by model 3.1. A bisection method is now imposed on . It is divided into

where in this case. can be set to a larger value as needed. The greater value of , the

smaller interval of , and thus more solutions, which is determined by the number of , will be

provided based on model 3.1. The complete results are shown in Table 4 and Table 5 which give

subsets of features in different sizes selected with BMTS for the Australian and German datasets

respectively.

Table 4

Complete subsets of features associated with given for Australian dataset. Subset of Variables Selected No. of Features OCAR

0 1 55.51%

0.060546875 2 85.51%

0.3046875 2 85.80%

0.400390625 3 85.65%

0.419921875 3 85.80%

0.5078125 4 85.94%

0.59765625 5 86.96%

0.703125 5 87.54%

0.71875 5 87.25%

0.75 6 86.52%

0.779296875 7 87.68%

0.802734375 7 86.52%

0.826171875 8 87.97%

0.830078125 8 86.81%

0.837890625 9 86.81%

0.86328125 10 87.39%

0.880859375 11 87.25%

0.8984375 11 86.96%

0.90234375 12 87.39%

0.955078125 13 87.39%

1 14 87.10%

Page 59: Feature selection in credit scoring- a quadratic

Table 5

Complete subsets of features associated with given for German dataset.

Subset of Variables Selected No. of

Variables OCAR

0 1 70.0%

0.120239258 3 70.6%

0.422912598 3 73.5%

0.422973633 4 71.9%

0.531921387 4 73.4%

0.601623535 4 75.4%

0.620391846 5 75.4%

0.658599854 5 75.8%

0.684204102 6 75.1%

0.694030762 7 74.5%

0.711578369 7 75.8%

0.723571777 8 76.5%

0.753265381 9 77.4%

0.839416504 10 76.8%

0.861663818 11 76.8%

0.875 11 77.1%

0.876800537 11 77.1%

0.886505127 12 77.5%

0.894195557 13 77.3%

0.9112854 14 77.2%

0.930633545 15 77.2%

0.946868896 16 77.3%

0.960479736 17 77.6%

0.976745605 18 77.4%

0.985107422 19 77.5%

0.985168457 19 77.3%

0.991271973 20 77.5%

0.994476318 21 77.5%

0.997283936 22 77.6%

0.99822998 23 77.2%

1 24 78.3%

46

Page 60: Feature selection in credit scoring- a quadratic

47

There are 21 subsets of variables selected out of possible subset of variables for the

Australian dataset, and 31 subsets of variables selected out of possible subset of variables for

the case of Germany. As mentioned earlier, determines the size of subset and the number of

variables selected by the model changes from 1 to as increases from 0 to 1. Fig. 15 shows the

relationship between and the size of the two cases.

Fig. 15. Relationship between number of features and

4.2 Results and Comparison Analysis

After establishing subsets of variables, the next step is to find out the subsets with the

satisfactory OCAR via the SVM classifier. An advantage of this subset selection method is to

provide options for different sizes of subsets so that we can select a satisfactory solution by

considering both the number of features in the subset and the accuracy comprehensively. For

example, we may consider sacrificing some degree of accuracy for a smaller size of the subset

1

6

11

16

21

0 0.2 0.4 0.6 0.8 1

Nu

mb

er

of

feat

ure

s

α

German dataset

Australian dataset

Page 61: Feature selection in credit scoring- a quadratic

48

when the cost of collecting the data of variables is high. Table 6 gives the OCAR derived from

10-fold cross validation SVM based on selected subsets for the Australian dataset.

Table 6

OCAR for Australian case and comparison.

Reference Method No. of

Features OCAR

Proposed method

BMTS

+

SVM

5 87.54%

5 87.25%

7 87.68%

8 87.97%

Chen and Li (2010)

LDA+SVM

7

86.52%

DT+SVM 7 86.29%

RST+SVM 7 85.22%

Fscore+SVM 7 85.10%

Huang et al. (2007)

Grid+Fscore+SVM

7.6

84.20%

GA+SVM 7.3 86.90%

The accuracy rates of two size 5, one size 7 and one size 8 subsets are reported as satisfactory

solutions (See Table 4 and Table 5 for all accuracy rates of different sizes). We compare these

accuracy rates with the results from Chen and Li (2010) and Huang et al. (2007) since these two

studies also used 10-fold cross validation SVM to evaluate the accuracy rate based on selected

subsets. As we can see, the accuracy rate for the subset of size 7 from our method is 87.68%, and

it is higher than the rates from all other methods in the two studies. The subset of size 8 gives the

highest accuracy rate, 87.97%. Nevertheless, two subsets of size 5 provide satisfactory accuracy

rates with fewer variables.

Unfortunately, Chen and Li (2010) and Huang et al. (2007) did not provide which variables

are included in their selected subsets, and thus, we can only conclude that our method, overall,

improves the accuracy rate at this stage. In order to provide evidence that the subsets of features

selected by BMTS performs competitively well in accuracy prediction, we compare the accuracy

Page 62: Feature selection in credit scoring- a quadratic

49

rates based on BMTS selected subsets with the rates based on subsets selected by forward

selection (FS), backward elimination (BE), and stepwise selection.

Forward selection starts with the intercept term only and at each step it chooses a variable to

be added if the F-statistic of this variable exceeds a cut-off value (a pre-determined critical F-

value, say ). The first variable, , which has the largest simple correlation with the response

variable is selected to be entered. Now an F-test is carried out for to check whether the

hypothesis that the coefficient of is zero ( ) can be rejected. If the F-statistic exceeds

the , The variable is entered. is usually computed by , where is the confidence

level, e.g. . and are the number of observations and terms, including the variable

and the intercept, in the current subset model. As changes, the cut-off value changes as well.

Unlike the first variable, the second variable chosen for entry is the one that has the largest

partial correlation with response variable given is also in the model. Partial correlation aimed

at finding correlation between two variables while taking away the effects of another variable, or

several other variables, on this relationship. If the F-statistic is greater than associated with the

current step, then the second variable is also included in the model. The procedure repeated until

the partial F-statistic at a particular step does not exceed or when there is no more candidate

variable to be added.

Stepwise regression algorithm is developed by Efroymson and has been widely used for

multiple regression calculations. It is an extension of forward selection. In forward selection,

variables will never be dropped once they enter in the model. In stepwise regression, however,

the variable entered at the previous step is reassessed after inclusion of a new variable.

Comparing with , the cut-off value for dropping the variable is usually

making it relatively more difficult to add a variable than to delete one. Therefore,

Page 63: Feature selection in credit scoring- a quadratic

50

while the procedure for selecting the first variable is the same in forward selection as it is in

stepwise regression, there is one more step when processing the second variable. That is when

the second variable is entered into the model, we need to assess the previous entered variable,

. If the partial F-statistic for is now less than , would be dropped. Once the variable is

dropped, it cannot be used anymore and we will select the next variable from the remaining

candidate variables. Again, the procedure is terminated when the partial F-statistic does not

exceed or when the last candidate variable is added to the model.

The two sequential procedures described above start with no variable in the model and add

one variable at each step. Backward elimination works in the opposite way. It begins with the

full model of all variables and deletes one variable at a time. First, the partial F-statistic for

each variable is computed. The smallest of the partial F-statistic is selected and compared with

where in this first step. If the partial F-statistic for the selected variable

is less than , this variable is dropped. Backward elimination algorithm stops when the smallest

partial F-statistic is greater than the cutoff value . These three feature selection methods are

classic and are available for most commercial packages.

We also compare our results with the results in two other studies, Wang, Hedar et al. (2012)

and Gönen, Gönen, and Gürgen (2012), which provide the features in their selected subsets.

Under this circumstance, each subset can be evaluated in the same experimental condition where

accuracy rates are all derived from a 10-fold cross validation SVM. The results are shown in

Table 7.

The accuracy rate given by the subset of size 7 from the presented method, 87.68%, is

slightly higher than that given by the subset of size 7 in Wang, Hedar et al. (2012), which is

87.39%. The highest accuracy rate from Gönen et al. (2012) is 87.97% given by the subset of

Page 64: Feature selection in credit scoring- a quadratic

51

size 9, while BMTS method gives the same accuracy rate with subset of size 8. The accuracy

rates based on the subsets selected by the classic methods are much lower than the results from

our method.

Table 7

OCAR of selected subsets for Australian case and comparison.

Reference Method Features in selected subsets No. of

Features OCAR

Proposed

method

BMTS

+

SVM

5 87.54%

5 87.25%

7 87.68%

8 87.97%

Wang, Hedar

et al. (2012)

RSFS 7 87.39%

Gönen et al.

(2012)

PNS/PGNS 11 87.39%

PS 9 87.97%

PGS 7 87.10%

MKLNS/MKLGNS 12 87.39%

MKLS/MKLGS 9 86.23%

Classic

FS+SVM

7

78.70%

BE+SVM 9 78.99%

Stepwise+SVM 7 78.99%

BMTS method performs well in the case of the Australian dataset. Not only is the accuracy

rate higher but also the number of variables is fewer when comparing with the results from other

studies and classic methods. We also provide the comparison of accuracy rates, shown in Table

8, between our method and the rates in Chen and Li (2010) and Huang et al. (2007) for the

German dataset. While the accuracy rates in this study are higher than the rates from all other

methods in Chen and Li (2010), there is no significant differences between our results and those

in Huang et al. (2007) except that the number of features selected in the subsets is fewer in the

presented method. In addition, we compare the OCAR based on our selected subset with the rate

based on subset selected by the classic methods, and the results are given in Table 9. The

Page 65: Feature selection in credit scoring- a quadratic

52

accuracy rates are very close to each other except that one of the subsets given by BMTS method

contains 9 variables which is fewer than all the subsets given by classic methods.

Table 8

OCAR for German case and comparison.

Reference Method No. of

Features OCAR

Proposed method BMTS+ SVM

9 77.40%

12

77.50%

Chen and Li (2010)

LDA+SVM 12 76.10%

DT+SVM 12 73.70%

RST+SVM 12 75.60%

Fscore+SVM

12

76.70%

Huang et al. (2007) Grid+Fscore+SVM 20.4 77.50%

GA+SVM 13.3 77.92%

Table 9 OCAR of selected subsets for German case and comparison.

Reference Method Features in selected subsets No. of

Features OCAR

Proposed

method

BMTS+SVM 9 77.40%

12

77.50%

Classic

FS+SVM 11 77.50%

BE+SVM 14 77.40%

Stepwise+SVM 12 76.90%

In sum, BMTS method has the superiority in terms of accuracy rate and the number of

selected features. Moreover, it provides flexibility in that a tradeoff between OCAR and the size

of subset is available. This is very useful when the cost associated with data collection is high.

Take the Australian dataset for example, the highest accuracy rate, 87.97%, is associated with the

subset with size 8 whereas two subsets with size 5 give the accuracy rates of 87.54% and 87.25%

which are only slightly lower than the highest one. When the cost of data collection is high,

selecting subsets with size 5 exceeds the benefit from selecting the one with highest accuracy

Page 66: Feature selection in credit scoring- a quadratic

53

rate because by doing so, it reduces both number of features and the cost of data collection with

no significant difference in OCAR.

Page 67: Feature selection in credit scoring- a quadratic

54

CHAPTER V

APPLICATION OF THE CREDIT SCORING AT CORPORATE LEVEL

5.1 Reviews of Applications of Credit Scoring at Corporate Level

In this chapter, we apply the proposed method, (BMTS+SVM), for credit scoring problem at

corporate level. Credit scoring at this level comprises the assessment of risk, such as the

probability of default, bankruptcy or fraud, associated with lending to an organization (Paleologo

et al., 2010).

A well-known application of corporate credit scoring is bankruptcy prediction or

classification. An early study by Altman (1968) used financial ratios and discriminant analysis

for corporate bankruptcy prediction. The 22 potential financial ratios were grouped into five ratio

categories, namely profitability, liquidity, solvency, activity, and leverage ratios. By following

four criteria including statistical significance of alternative functions, inter-correlations between

the relevant variables, predictive accuracy, and expert opinion, Altman (1968) selected 5 features

(financial ratios) to predict corporate bankruptcy with discriminant analysis. Frydman et al.

(1985) analyzed financial distress of firms with 20 financial ratios, and they introduced recursive

partitioning algorithm (RPA), a nonparametric technique based on pattern recognition, to

improve the classification accuracy.

Based on previous studies and inter-correlation, Leshno and Spector (1996) filtered 29

financial parameters from 70 ratios, and evaluated the prediction capability of various neural

network models which were differed in terms of data span, number of iterations, and neural

network architecture. McKee and Lensberg (2002) used rough sets model to identify variables

that are important for the prediction, and developed a structural model of bankruptcy solved with

genetic programming algorithm. Ryu and Yue (2005) used simple feature reduction techniques

Page 68: Feature selection in credit scoring- a quadratic

55

such as stepwise discriminant analysis, sequential elimination, and mutual information based

feature selection to choose features from 23 financial ratios, and they introduced a linear

programming technique called isotonic separation to separate bankrupt and non-bankrupt firms.

Shin et al. (2005) selected 52 variables out of more than 250 financial ratios using independent-

samples t-test in the first stage. They further selected 10 variables by MDA stepwise method, and

evaluated the predictive performance of bankruptcy with SVM.

Min and Lee (2008) employed Data Envelopment Analysis (DEA) for bankruptcy prediction.

57 features were classified into categories of profitability, growth, productivity, liquidity,

activity, and cost structure, and six final financial ratios were chosen by using factor analysis and

judgment of the experts. DEA score, ranged from 0 to 100, were reported to specify the financial

performance. While the best firms have DEA score of 100, a firm with lower DEA score is

considered to be relatively worse than other firms, and thus has higher probability of bankruptcy.

Etemadi et al. (2009) selected 5 financial ratios out of the 43 candidate ratios with discriminant

stepwise procedure. Prediction of corporate bankruptcy was then conducted by using a genetic

programming model. Min and Jeong (2009) identified 9 variables from 27 financial ratios based

on various feature selection methods such as independent sample t-test, discriminant analysis,

logistic regression, and decision trees. They proposed a binary classification method, solved with

genetic approach, to classify observation firms into bankrupt and non-bankrupt according to the

distance between a representative firm and observation firms.

Olson, Delen, and Meng (2012) illustrated their preference of using decision trees to predict

corporate failure. They argued that decision trees could provide models with transparency and

transportability as well as accurate. Fedorova, Gilenko, and Dovzhenko (2013) first selected 75

financial ratios from 98 ratios with ANOVA test, and then applied different combinations of

Page 69: Feature selection in credit scoring- a quadratic

56

learning algorithm, including multiple discriminant analysis, logit regression, classification and

regression trees, to identify final financial ratios. These ratios were evaluated by two types of

artificial neural networks to derive the classification accuracy rate for the bankruptcy prediction.

Table 10 lists financial ratios in some of the aforementioned studies. A review of bankruptcy

prediction in banks and firms via statistical and intelligent techniques by Kumar and Ravi (2007)

was another good source that provided lists of financial ratios used for bankruptcy studies.

Table 10

Financial ratios in bankruptcy prediction literatures. Authors (Year) Financial Ratios Sample

Ratios

Altman (1968) Working capital/Total assets; retained earnings/Total assets; EBIT/Total

assets; Market value equity/Book value of debt; Sales/Total assets

1:1

Frydman et al.

(1985)

Cash/Total assets; Cash/total sales; Cash flow/total debt; Current

assets/Current Liabilities; Current assets/total assets; Current assets/total

sales; EBIT/total assets; Log (interest Coverage + 15); Log (total assets);

Market value of equity/total capitalization; Net income/total assets; Quick

assets/total assets; Quick assets/current liabilities; Quick assets/sales;

Retained earnings/total assets; Standard deviation of (EBIT/total assets);

Total debt/total assets; Total sales/total assets; Working capital/total assets;

Working capital/total sales

2.5:1

Leshno and Spector

(1996)

Working capital/total sales; Retained earnings/total assets; Earning before

income tax/total assets; Market value/total liabilities; Sales/ total assets;

EBIT per share; Cash flow per share; Cost of goods sold/sales; Capital

expenditures per share; Sales/cash; Receivables turnover; Inventory

turnover; ROE; ROI; Investments/assets/ Long term debt/total liabilities;

Debt/equity; Long term debt/equity; Quick ratio; price/earnings ratio;

Dividend yield; Total debt/total assets; Quick assets/sales; Sales/total

capital; Log (total assets); Interest coverage; Log (interest coverage);

Earning/5 years maturity; Cash flow/total debt; Working capital/long term

debt; Working capital/cash expenses; Book equity/total capital; Market

equity/total capital; Average market equity/total capital; StDv (log

(EBIT/total assets)); Sales/gross fixed assets; Sales/receivables; ROA; Total

debt/invested capital; Current ratio; Worth/total debt; Net income/total debt;

Operating income/sales; EBIT/total tangible assets; Net available for capital

/total capital; Sales/total tangible assets; EBIT/sales; Current liabilities/total

liabilities; Net available for total capital/sales; Fixed charge coverage; Cash

flow/Fixed charges; earning/total debt; retaining earning/tangible assets;

Capital lease/total assets

1:1

McKee and

Lensberg (2002)

General & Administration expense/net sales; Net income/net worth; Current

assets/current liabilities; Liabilities/total assets; Net worth/net fixed assets;

Working capital/net worth; Net income/total assets; Cash/current liabilities;

Investment cash flow/net income

1:1

Page 70: Feature selection in credit scoring- a quadratic

57

Table 10: Financial ratios in bankruptcy prediction literatures (continued)

Authors (Year) Financial Ratios Sample

Ratios

Ryu and Yue

(2005)

Cash flow/total assets; cash/sales; cash flow/total debt; current assets/current

liabilities; current assets/total assets; current assets/sales; EBIT/total assets;

Retained earnings/total assets; Net income/total assets; Total dent/total assets;

Sales/total assets; Working capital/total assets; Working capital/sales; Quick

assets/total assets; Quick assets/current liabilities; Quick assets/sales; Market

value of equity/total capitalization; Cash/current liabilities; Current

liabilities/equity; Inventory/sales; Equity/sales; Market value of equity/total

debt; Net income/total capitalization

1:1

Shin et al. (2005) Total asset growth; Contribution margin; Operating income to total asset; Fixed

asset to sales; Owner’s equity to total asset; Net asset to total asset; Net loan

dependence rate; Operating asset constitute ratio

1:1

Etemadi et al.

(2009)

EBIT/total assets; Long term debt/Shareholders’ equity; Retained

earnings/stock capital; Market value of equity/total liabilities; Market value

equity/shareholders’ equity; Market value equity/total assets; Cash/total assets;

Total liabilities/total assets; Current liabilities/shareholders’ equity; Current

liabilities/total liabilities; (Cash + short term investments)/current liabilities;

(Receivables + inventory)/total assets; Receivables/sales;

Receivables/inventory; Shareholders’ equity/total liabilities; Shareholders’

equity/total assets; Current assets/current liabilities; Quick assets/current

liabilities; Quick assets/total assets; Fixed assets/(shareholders’ equity + long

term debt); Fixed assets/total assets; Current assets/total assets; Cash/current

liabilities; Interest expenses/gross profit; Sales/cash; Sales/total assets; Working

capital/total assets; paid in capital/shareholders’ equity; Sales/working capital;

Retained earnings/total assets; Net income/shareholders’ equity; Net income/

sales; Net income/total assets; Operational income/sales; Operational

income/total assets; EBIT/interest expenses; EBIT/sales; Gross profit/sales;

Sales/shareholders’ equity; Sales/fixed assets; Sales/current assets

1:1

Min and Jeong

(2009)

Gross value added/sales; Gross value added/total assets; Growth rate of total

assets; Ordinary income/sales; Net; Income/sales; Operating income/sales;

Costs of sales/sales; Net interest expenses/sales; Ordinary income/total assets;

Rate of earnings on total capital; Net working capital/total assets; Current

liabilities/total assets; Stockholders’; equity/total assets; Total borrowings and

bonds payable/total assets; Total assets turnover; Ordinary income/total assets;

Net working capital/sales; Stockholders’ equity/sales; Ordinary income/total

assets; Depreciation expenses; Operating assets turnover; Interest expenses/total

expenses; Net interest expenses; Break-even point ratio; Employment costs;

Interest expenses and net income/total assets; Earnings before interest and

taxes/sales

1:1

Fedorova et al.

(2013)

Cash flow/total liabilities; Cash flow/equity; Cash flow/total sales; Cash

flow/total assets; Cash flow/equity; Cash flow/current liabilities; Cash flow/total

assets; Cash flow/total sales; Cash flow/current liabilities; Gross profit/total

sales; Gross profit/total assets; EBT/total liabilities; Profit on sales/total sales;

Profit on sales/total assets; Net income/total liabilities; EBT/total sales;

EBT/total assets; Profit on sales/current liabilities; Gross profit/cost of goods

sold; Profit on sales/equity; Net profit/current liabilities; Profit on sales/cost of

goods sold; Gross profit/total liabilities; Gross profit/current liabilities;

EBT/cost of goods sold; Gross profit/equity; Profit on sales/total liabilities; Net

profit/cost of goods sold; Sales/fixed assets; Sales/equity; (Cost of goods sold -

depreciation)/accounts payable; Sales/current assets; Sales/total liabilities; (Cost

6:1

Page 71: Feature selection in credit scoring- a quadratic

58

Table 10: Financial ratios in bankruptcy prediction literatures (continued)

Authors (Year) Financial Ratios Sample

Ratios

of goods sold - depreciation)/inventories; Sales/(cash + invested funds);

Sales/current liabilities; Sales/(cash + invested funds + accounts receivable);

Sales/accounts receivable; Sales/working capital; Cost of goods sold/finished

goods; Cash/current liabilities; Short-term accounts receivable/accounts

payable; (Cash + invested funds)/(costs/365); (Equity - fixed assets)/current

assets; Quick assets/(costs/365); Quick assets/total assets; Long-term

liabilities/equity; Cash/total assets; Quick assets/current assets; Current

assets/total liabilities; Cash/current assets; Short-term liabilities/total liabilities;

Current assets/total assets; Revenue reserves/equity; Long-term liabilities/fixed

assets; (Cash + invested funds)/total assets; Revenue reserves/total assets; Long-

term liabilities/total liabilities; (Equity + long - term liabilities)/total assets;

Revenue reserves/total liabilities; Current liabilities/total liabilities; Working

capital/inventories; Long-term liabilities/total assets; Accounts payable/total

liabilities; Retained earnings/equity; Fixed assets/total assets; Accounts

payable/accounts receivable; Log (tangible total assets); Debt/total assets; Profit

before tax/current liabilities; Working capital/total debt; Equity/total liabilities;

Working capital/total assets; Log (EBIT)/interest Net profit/costs; Retained

earnings/total assets; EBT/equity Current liabilities/(cash + invested funds);

Sales/total assets; EBIT/total assets; Total assets/sales; Cash flow/total debt;

No-credit interval; Current liabilities/total assets; Net profit/equity

5.2 A Study of Credit Scoring for the U.S. and Chinese Companies

Likewise, this study tries to use the proposed method as a tool to identify key financial

factors from a pool of financial ratios. These selected key factors are considered to have best

discriminating power in classifying companies into two groups. We initiates with 40 features

(financial ratios) for companies from the U.S. and China respectively. The features are grouped

into 7 categories, and all the companies are classified into either creditworthy companies

(CWCs) or less creditworthy companies (LCWCs) according to some criteria. Aside from the

goal of identifying key financial categories and features, the study in this chapter also discuss the

predictive performance and evaluation of the classification models.

The 7 financial categories are in line with the financial ratio categories provided by GTA

database, and are categories with cash flow ratios, profitability ratios, liquidity ratios, solvency

ratio, shareholders’ profitability ratios, operating ratios, and leverage ratios. GTA database is a

Page 72: Feature selection in credit scoring- a quadratic

59

leading global provider of China financial market, industries and economic data. It also provides

financial analytics, financial education and related value-added services to financial institutions,

(e.g. Morgan Stanley Composite Index (MSCI), the China Securities Regulatory Commission,

etc.) business schools, (e.g. Wharton Business School, Harvard Business School, and University

of Chicago, etc.) and individual investors.

Table 11 Financial ratios for the U.S. and Chinese companies.

Category Description of Features

Cash Flow Ratios

(5)

X1: OANCF/LCT

X4: OANCF/NI

X2: OANCF/DT

X5: OANCF/CSHI

X3: CH /SALE

Profitability Ratios

(9)

X6: NI/SALE

X9: NI/AT

X11: TXT/GP

X13: COGS/SALE

X7: EBIT/SALE

X10: NI/TEQ (C)

X12: GP/XT (C)

X14: FEXP/SALE (C)

X8: EBIT/AT

X10: NI/(AT-LT) (U)

X12: GP/(SALE-EBIT) (U)

X14: EXP/SALE (U)

Liquidity Ratios (4)

X15: (ACT-INVT)/LCT

X18: WACP/AT

X16: ACT/LCT

X17: (ACT-LCT)/ACT

Solvency Ratio (8)

X19: LT/AT

X21: LT/CEQ

X24: CEQ/AT

X20 :(NI+TXT+FEXP)/FEXP(C)

X22: MKVALT/LT

X25: LCT/LT

X20: EBIT/EXP(U)

X23: DT/AT

X26: DLTT/LT

Shareholder’s

Profitability Ratios

(5)

X27: PRCC/EPSPX

X29: SALE/CSHI

X28: PRCC/NAVPS(C)

X30: MKVALT/AT

X28: PRCC/(AT-LT)(U)

X31: PRCC*/AT

Operating Ratios (7)

X32: SALE/RECT

X34: (SALE-GP)/LCT(U)

X36: SALE/(AT-ACT)(U)

X33: INVT/SALE

X35: (ACT-INVT)/SALE

X37: SALE/AT

X34: OPC/PAYT(C)

X36: SALE/FA

X38: SALE/CEQ

Leverage Ratios (2)

X39: EBIT/FEXP(C)

X39: EBIT/(EBIT-EXP)(U)

X40: GP/EBIT

ACT: Total Current Assets; AT: Total Assets; CEQ: Total Common/Ordinary Equity; CH: Cash; COGS: Cost of Goods Sold;

CSHI: Common Shares Issued; DLC: Total Debt in Current Liabilities; DLTT: Total Long-Term Debt; DT: Total Debt

(DT=DLC+DLTT); EBIT: Earnings Before Interest and Taxes; EPSPX: Earnings Per Share (Basic) Excluding Extraordinary

Items; EXP: Expense (EXP=EBIT-NI-TXT); FA: Fixed Assets; FEXP: Financial Expense; GP: Gross Profit (Loss); INVT:

Total Inventories; LCT: Total Current Liabilities; LT: Total Liabilities; MKVALT: Total Market Value; NAVPS: Net Asset

Value per Share; NI: Net Income (Loss); OANCF: Operating Activities Net Cash Flow; OPC: Operating Costs; PAYT: Total

Payables; RECT: Total Receivables; PRCC: Price Close; SALE: Sales; TEQ: Stockholders’ Equity; TXT: Income Taxes;

WCAP: Working Capital; XT: Total Expense

Note: Due to availability of the data, the same feature is slightly different between the U.S. and Chinese companies in some

cases. The letters “U” and “C” in the parentheses indicate the ratios for the U.S. and China respectively.

Page 73: Feature selection in credit scoring- a quadratic

60

The 40 financial ratios for Chinese companies are selected from the 7 categories from GTA

database. The 40 financial ratios for the U.S. companies are computed with the financial

indicators collected from COMPUSTAT database to match with the ratios in the Chinese case.

These financial ratios are also grouped into the same 7 categories. A brief introduction of the 7

financial categories is given as follow and the 40 financial ratios are shown in Table 11. A

statistical description about the U.S. and Chinese datasets for each of the 40 financial ratios are

given in Appendix B and Appendix C respectively. The normality for each features are tested

with Skewness and Kurtosis. While Skewness is a measure of symmetry, Kurtosis is a measure

of whether the data are peaked or flat relative to a normal distribution. Both Skewness and

Kurtosis are 0 indicate a normal distribution. The descriptions show that there is no feature with

normal distribution for both datasets.

1. The category with cash flow ratios is used to determine companies’ ability of generating

cash in their operating activity. This category is important because companies can make

themselves look profitable by manipulating with the magic of accounting and non-cash

transactions such as sales on credit, but in fact are at a financial risk if they generate little

cash from these profits. Therefore, ratios in this category give us a better understanding at

the financial health and performance of companies. OANCF/NI ratio, for example,

compares companies’ operating activity net cash flow to their net income giving us an

idea about how much cash they can generate from the net income, and how much amount

of cash they have to cover obligations.

2. The category with profitability ratios explains how well companies employed their

resources in generating profit. Companies with higher gross profit margins or returns on

capital have better chance to survive in the economy downturn than those have razor-thin

Page 74: Feature selection in credit scoring- a quadratic

61

margins or returns on capital. NI/SALE, for instance, measures the return earned on

companies’ capital relative to each dollar of sales. Another widely used ratio is NI/AT

which refers to the return on assets (ROA) ratio. It illustrates how well management

utilized the company's total assets to make profits.

3. The category with liquidity ratios reflects companies’ ability to meet their short-terms

debts obligations. Generally, higher value of these ratios indicate larger margin of safety

to pay for short-term debts. In contrast, a company with low coverage of liquid assets to

short-term debts may have difficulty to run its operations, as well as meet its obligations.

Two common liquidity ratios are ACT/LCT (current ratio) which measures companies’

ability to meet their current liabilities with their current assets such as cash, accounts

receivable and inventories, and (ACT-INVT)/LCT (quick ratio) which measures

companies’ ability to pay their short-term obligations with their most liquid assets.

4. The category with solvency ratios reflects companies’ capacity to meet their long-term

financial commitments. The higher companies’ solvency ratio is, the lower the

probability that they will default on their long term debt obligations. An example of

solvency ratio is DT/AT which measures what percentage of companies’ assets is

financed with debt. A higher ratio indicates a greater financial risk for these companies to

pay off their long term obligations.

5. The category with shareholders’ profitability ratios is considered part of profitability

financial ratios but focus more on companies’ ability in generating profit with

shareholders’ equity. PRCC/NAVPS, the ratio of price close to net asset value per share,

is used to capture this ability.

Page 75: Feature selection in credit scoring- a quadratic

62

6. The category with operating ratios shows the efficiency of management and companies

operations in using their capital. SALE/AT ratio (total asset turnover), for instance,

measures companies’ ability of using their assets in generating sales revenue. Another

example of operating ratio is SALE/RECT (account receivable turnover) which measures

the effectiveness of companies in extending credit and collecting debts. Companies

should reassess their credit policies when this ratio is low in order to ensure the timely

collection of imparted credit that is not earning interest for the companies.

7. The category with leverage financial ratios shows the percentage of a company’s capital

structure that is made up on debt or liabilities owed to external parties. The financial

leverage ratio indicates the extent to which the business relies on debt financing.

EBIT/(EBIT-Interest Expense) is an example of this ratio. In addition, the operating

leverage of a business is the ratio of the change in EBIT to the change in sales. The

computation of this ratio can be expressed as GP/EBIT

The classification of creditworthy and less creditworthy for U.S. and Chinese companies are

based on Standard & Poor’s COMPUSTAT credit rating and ST classification respectively. The

classification standard for the U.S. companies is based on one of the big three credit-rating

agencies, Standard & Poor’s (S&P) credit ratings. The data can also be obtained from

COMPUSTAT. These ratings are the S&P’s opinion about the ability and willingness of issuers,

e.g. corporations, to meet their financial obligations in full and on time. Also their ratings reflect

the credit quality and the probability that the debt may default. Although S&P’s stated that

ratings opinions were not intended as guarantees of credit quality or as exact measures of the

likelihood that a particular issuer or particular debt issue will default, their studies on defaults

indicated a strong correlation between ratings and default frequencies. Generally the higher the

Page 76: Feature selection in credit scoring- a quadratic

63

rating is, the lower the frequency of default, and vice versa. In addition, one of S&P’s studies

have shown that issuers rated ‘B+’ or lower accounted for 61% of defaults, over all 7-year

intervals between 1981 and 2010.

In more specific, the dichotomic classification standard regarding to the U.S. companies in

our study rests on the S&P’s long-term credit ratings which are divided into several categories

ranging from ‘AAA’, indicating the strongest credit quality, to ‘D’ or ‘SD’, indicating the lowest

credit quality. Long-term ratings from ‘AA’ to ‘CCC’ may be modified by the additional sign of

plus or minus to show relative standing within the major rating categories. The definitions of all

the ratings are shown in Appendix D.

This well known long term credit ratings from the S&P are utilized for classification standard

where companies with obligation rated B and above are classified into creditworthy group, It is

believed that the obligors still have the capacity to meet their financial commitment on the

obligation with an obligation rated ‘B’, though is more vulnerable to default than obligations

rated with B above. According to the description of the S&P’s credit rating, companies with

obligation rated ‘CCC’ are vulnerable to nonpayment, and in the event of adverse business,

financial, or economic conditions, the obligors are unlikely to have the capacity to meet their

financial commitment on the obligation. Therefore, it is reasonable to consider the U.S.

companies with obligation rated B and above as creditworthy group, denoted with 0 while

corporations with obligation rated B below, starting from ‘CCC’, are classified into less

creditworthy group, denoted with 1. For the U.S. companies, we use the credit rating, which is

quarterly based, at the last quarter of year , and the financial ratios corresponding to the credit

ratings are obtained from year .

Page 77: Feature selection in credit scoring- a quadratic

64

For Chinese companies, ST classification standard is used where ST stands for Special

Treatment. The original idea behind this ST classification is to warn investors to be cautious

about the companies labeled ST due to their abnormal financial conditions according to the rules

issued by China Securities Regulatory Commission (CSRC), a ministry-level unit directly under

the State Council. ST companies usually face the problem of low profitability and higher default

risk on their debts, and thus can be considered as less creditworthy companies (Lü & Zhao, 2004;

Xiong, 2013). Therefore, it is reasonable to use ST and non-ST as classification standard to

group Chinese companies into creditworthy, denoted with 0, and less creditworthy companies,

denoted with 1. Now suppose a company is announced as ST at the year . The financial ratios

used for classification corresponding to this ST or non-ST status are obtained from year .

The reason is that according to the disclosure policy of Chinese listing companies, the

announcement for a company to be ST at year is mainly based on the financial performance of

year , and thus using financial ratios from year to predict the ST status at year will

raise the problem of overestimating the predictive power of a model. Consequently, we use the

ST status of a company at year while the matching financial ratios are derived from year of

. For example, the financial ratios for a company from the year of 2006 will be used to

predict its status of ST or non-ST at the year of 2008. The status of ST or non-ST for a company

is derived from GTA database as well.

We initiate with 40 features, numeric financial ratios, for both the U.S. and Chinese

companies. All of the companies are from nonfinancial sector. The U.S. dataset includes 238

corporations and 297 observations from 1999 to 2011. Chinese dataset contains 593 corporations

and 900 observations from 1998 to 2010. The descriptions of the U.S. and Chinese datasets are

given in Table 12. The ratio for the number of CWCs to LCWCs is set to 2:1. There is no

Page 78: Feature selection in credit scoring- a quadratic

65

conclusive evidence to show what this ratio should be as we can see from the studies listed in

Table 10. While most studies used 1:1, some other studies used different ratios, such as 6:1 and

2.5:1. In practical, however, the number of companies with obligation rated B or above is more

than that of below B in the United States. Also, the number of the non-ST companies is more

than the number of the ST companies in China. In order to reflect this reality as well as avoiding

the problem of extremely unbalance sample size between the two classes, we set the ratio to 2:1.

Table 12

Description of the U.S. and Chinese datasets.

Country No. of

Features

Features

Property

No. of

Classes

No. of

Companies

Sample

Size CWCs LCWCs

U.S. 40 Numeric 2 238 297 198 99

China 40 Numeric 2 593 900 600 300

In sum, a study on credit scoring problem between the U.S. companies and Chinese

companies is conducted in this section. The classification standard of CWCs and LCWCs is

based on S&P’s credit rating for the U.S. case and ST for Chinese case. Forty financial ratios

which are categorized into seven groups act as the initial features from which satisfactory

features of subsets will be selected by the proposed method. The flow chart in Fig. 16 depicts the

structure of the study in this chapter.

We firstly try to provide some insights about the application of the proposed feature selection

method in identifying the key factors in terms of financial categories and ratios that provide best

discriminating power to distinguish CWCs from LCWCs in both countries. Secondly, the

predictive performance, in terms of OCAR, of different classifiers, namely SVM, logistic

regression, discriminant analysis, decision tree, and neural networks, on the best subsets among

the satisfactory subsets are evaluated

Page 79: Feature selection in credit scoring- a quadratic

66

Fig. 16. Structure of credit scoring study at corporate level

Following the same procedure in the case of Australian and German credit scoring problem,

we obtain three satisfactory subsets of features providing the top three subsets with highest

overall classification accuracy rate (OCAR) as well as satisfying outcomes regarding to the size

of subsets (the complete results for all subsets are given in Appendix E and Appendix F). The

results of features in selected subsets, number of features, and OCAR for the U.S. and China

cases are reported in Table 13 and Table 14 respectively.

The results are encouraging. Not only the predictive performance in terms of OCAR for the

three satisfactory subsets selected by the proposed method is better, but also the sizes of the

subsets are far fewer than full model which includes all 40 features. In the case of the U.S., the

Proposed subset

selection method

Proposed subset

selection method

40 Features (Financial ratios)

Data source: GTA

Classification standard: ST

Satisfactory subsets of

features

Performance of

different classifiers

on the best subset

Key factors for

classification in

China

Satisfactory subsets of

features

Performance of

different classifiers

on the best subset

Key factors for

classification in

U.S.

40 Features (Financial ratios)

Data source: COMPUSTAT

Classification standard:

S&P’s credit rating

China U.S.

Page 80: Feature selection in credit scoring- a quadratic

67

OCARs provided by the three satisfactory subsets with size 3 to 5 are 6.73% to 7.74% higher

than the rates derived from the full model. In the Chinese case, the OCARs provided by the three

satisfactory subsets with size 3 are 3.67% to 4.11% higher than the rates derived from the full

model.

Table 13

OCAR for the U.S. dataset and comparison.

Method Features in selected subsets No. of

Features OCAR

All + SVM 40 69.36%

BMTS +SVM 3 77.10%

4 76.77%

5 76.09%

FS+SVM 6 75.08%

BE+SVM 12 72.39%

STEPWISE+SVM 4 71.38%

Table 14 OCAR for Chinese dataset and comparison.

Method Features in selected subsets No. of

Features OCAR

All + SVM 40 69.00%

BMTS +SVM 3 73.11%

3 72.67%

3 72.67%

FS+SVM 8 71.11%

BE+SVM 38 69.78%

STEPWISE+SVM 11 71.22%

We also compare the OCARs based on our selected subsets with the rates based on subsets

selected by the classic feature selection methods which are forward selection, backward

elimination, and stepwise selection. The results give supportive evidence that the subsets selected

with BMTS method give better predictive performance and smaller size of the subsets in both

cases.

Page 81: Feature selection in credit scoring- a quadratic

68

The best subset selected by proposed method, in the U.S. case has three features which are

from profitability ratios, from solvency ratios, and from leverage ratios. The second

and third best subsets which also provide small sizes of subsets and OCARs very close to the

best one include features and from cash flow ratios, and from profitability ratios,

and and from solvency ratios. Therefore, we conclude that profitability, solvency, cash

flow, and leverage ratios are four key financial categories while features

are the most representative features that provide best

discriminating power to differentiate between CWCs and LCWCs in the United States.

This conclusion can be supported by the S&P’s criteria publications where they provided 8

key industrial financial ratios which are EBIT interest coverage, EBITDA interest coverage, long

term debt to capital, funds from operations (FFO) to total debt, free operating cash flow (FOCF)

to total debt, return on capital, operating income to sales, and total debt to capital. The long term

debt to capital belongs to solvency ratios reflecting companies’ capacity to meet their long-term

financial commitments. FFO to total debt, FOCF to total debt, and EBITDA interest coverage

pertain to cash flow ratios revealing companies’ ability of generating cash in their operating

activity. Return on capital, operating income to sales, and EBIT interest coverage are

profitability ratios explaining how well companies employed their resources in generating profit.

Finally, total debt to capital is considered as leverage ratios showing the percentage of a

company’s capital structure that is made up on debt.

Table 15 list 9 financial ratios selected with the proposed method and their corresponding

categories, and 8 financial ratios and their corresponding categories provided by S&P’s

publication. We conduct the comparison on the level of financial categories rather than

individual financial ratios because different financial ratios are used between this study and S&P.

Page 82: Feature selection in credit scoring- a quadratic

69

The result is supportive in that the 9 selected financial ratios from the proposed method and 8

financial ratios from S&P can be attributed to the same four categories which are cash flow,

profitability, solvency and leverage ratios. Therefore, the consistency in matching all the

financial categories between the categories derived from our method and ones provided by a

widely recognized credit rating agency, S&P, provide evidence that the proposed method can be

applied to identify key factors so that on one hand, financial institutions are able to gain better

understanding about the credit status of their applicants by focusing on these key factors. On the

other hand, companies that attempt to borrow money from financial institutions are able to attain

clear vision on what are the most important factors for being considered a creditworthy

company, and what they need to improve to increase the chance of receiving loans.

Table 15

Comparison of financial ratios between the U.S. dataset and S&P. Categories Financial ratios Categories Financial ratios

BMTS+SVM S&P

Cash flow ratios X4: OANCF/NI Cash flow ratios FFO/Total debt

X5: OANCF/CSHI FOCF/Total debt

EBITDA interest coverage

Profitability ratios X7: EBIT/SALE Profitability ratios Return on capital

X10: NI/(AT-LT) Operating income/Sales

X13: COGS/SALE

EBIT interest coverage

Solvency ratios X19: LT/AT Solvency ratios Long term debt/Capital

X21: LT/CEQ

X24: CEQ/AT

Leverage ratios X39: EBIT/(EBIT-EXP) Leverage ratios Total debt/Capital

The supportive evidence from the case of the U.S. motivates us to apply the proposed method

in identifying the key factors to differentiate between CWCs and LCWCs in China. The seven

financial ratios and their corresponding five categories for Chinese case are listed in Table 16.

Page 83: Feature selection in credit scoring- a quadratic

70

The results show that the four categories of cash flow, profitability, solvency, and leverage

ratios are key financial categories in predicting the classification of CWCs and LCWCs in both

countries. However, category with operating ratios is an additional useful category to separate

CWCs from LCWCs in China. An additional support for the identified categories as the key

financial categories is that if we select one key financial ratio from each of the key financial

ratios, the combination of gives an even higher OCAR, 78.79%, in the U.S. case,

and gives a higher OCAR of 73.56% in Chinese case.

Table 16 Comparison of financial ratios between the U.S. and Chinese dataset.

Categories Financial ratios Categories Financial ratios

U.S. China

Cash flow ratios X4: OANCF/NI Cash flow ratios X1: OANCF/LCT

X5: OANCF/CSHI

Profitability ratios X7: EBIT/SALE Profitability ratios X8: EBIT/AT

X10: NI/(AT-LT) X14: FEXP/SALE

X13: COGS/SALE

Solvency ratios X19: LT/AT Solvency ratios X25: LCT/LT

X21: LT/CEQ X26: DLTT/LT

X24: CEQ/AT

Operating ratios X37: SALE/AT

Leverage ratios X39: EBIT/(EBIT-EXP) Leverage ratios X39: EBIT/FEXP

This result reveals that the gap of operating capacity between CWCs and LCWCs in the U.S.

is not as significant as the gap in China. We further use ANOVA, which is used to determine

whether there are any significant differences between the means of two or more independent

groups, to verify this statement. In this case, we are expecting to see that the difference between

the means of CWCs and LCWCs for features in operating ratios from the U.S. is not significant

whereas it is significant in the case of China. The results are shown in Table 17 and Table 18.

Page 84: Feature selection in credit scoring- a quadratic

71

The result is consistent with our expectation. While none of the ANOVA result for the seven

features in operating ratios is significant in the U.S., the ANOVA results for the five features out

of seven in operating ratios are significant in China. The P value for the feature, , selected by

our proposed method is which is a very significant result.

Table 17

ANOVA for features in operating ratios from the U.S. dataset.

Features Sum of Squares df Mean Square F Sig.

X32 Between Groups

2031.462 1 2031.462 1.597 .207

X33 Between Groups

.009 1 .009 .767 .382

X34 Between Groups

.322 1 .322 .051 .821

X35 Between Groups

.047 1 .047 .335 .563

X36 Between Groups

5.707 1 5.707 .492 .483

X37 Between Groups

1.383 1 1.383 1.863 .173

X38 Between Groups 119202.068 1 119202.068 2.735 .099

Table 18

ANOVA for features in operating ratios from Chinese dataset. Features Sum of Squares df Mean Square F Sig.

X32 Between Groups

26561.069 1 26561.069 19.948 .000

X33 Between Groups

3.968 1 3.968 8.946 .003

X34 Between Groups

1547.700 1 1547.700 .651 .420

X35 Between Groups

35.167 1 35.167 37.316 .000

X36 Between Groups

407.082 1 407.082 3.330 .068

X37 Between Groups

5.177 1 5.177 26.290 .000

X38 Between Groups 12.277 1 12.277 3.939 .047

A possible explanation of this finding that category with operating ratios is a key financial

category in China but not in the U.S. is due to the different conditions and capacity in obtaining

Page 85: Feature selection in credit scoring- a quadratic

72

financial sources to repay their debts. In the category of operating ratios, almost all the financial

ratios are related to sales. Since China is still an emerging country, companies do not have so

much resource and access to commercial finance as the companies in the United States. In China,

sales revenue is the most important source of finance for a company to pay for the debts, whereas

the U.S. companies have more source of finance to raise funds to repay their debts other than

rely merely on the sales revenue. Therefore, category with operating ratios plays more important

role in differentiating between CWCs and LCWCs in China than in the United States.

To summarize, based on the data collected for the U.S. dataset, profitability, solvency, cash

flow, and leverage ratios are four key financial categories, and 9 out of 40 features, namely

OANCF/NI, OANCF/CSHI, EBIT/SALE, NI/(AT-LT), COGS/SALE, LT/AT, LT/CEQ,

CEQ/AT, and EBIT/(EBIT-EXP) are most useful financial ratios in their corresponding financial

categories that can effectively differentiate between CWCs and LCWCs in the U.S. case.

Similarly, Chinese case has the same four categories plus a financial category with operating

ratios as key financial categories, and 7 out of 40 financial ratios which are OANCF/LCT,

EBIT/AT, FEXP/SALE, LCT/LT, DLTT/LT, EBIT/FEXP, and SALE/AT in Chinese case are

the most representative features in their corresponding categories. The application of the findings

is twofold. On one hand, managers of financial institutions can pay more attention to the ratios in

the key financial categories especially the most representative ratios selected with our proposed

method so that they are able to gain better understanding about the credit status of their

applicants before making any further decisions. Managers should also be aware that key financial

categories may vary for different countries On the other hand, companies that attempt to borrow

money from financial institutions are able to attain clear vision on what are the most important

Page 86: Feature selection in credit scoring- a quadratic

73

financial factors for being considered a creditworthy company, and what improvement are

needed immediately to increase the chance of receiving loans.

5.3 Model Predictive Performance and Evaluation

In this section, predictive performance of the models in classifying companies into either

CWCs or LCWCs is measured with overall classification accuracy rate (OCAR) and cost of

misclassification. The OCAR is computed based on the best subsets among the three satisfactory

subsets, namely , and for the U.S. companies, and , and for Chinese

companies. In addition, we compare the performance of five models using different classifiers

including SVM, logistic regression (LR), discriminant analysis (DA), decision trees (DTs), and

neural networks (NN). We use SPSS to run LR, DA, and DT where DT is in a form of

classification and regression trees (CARTs). SVM and NN are performed in MATLAB. Finally,

the impact of cutoff value on classification and the cost of misclassification associated with Type

I and Type II errors are also discussed. Cutoff values are important since on one hand, whether a

company is classified into one class instead of the other relies on this cutoff value in most

classification techniques. On the other hand, in statistics, Type I and Type II errors which are

used to compute misclassification cost depend on the cutoff value as well. Type I error is the

incorrect rejection of a true null hypothesis when it is in fact true, and it is a false positive. Type

II error is the failure to reject a false null hypothesis when in fact the alternate hypothesis is true,

and it is a false negative. In this particular credit scoring problem, Type I error refers to a CWC

is misclassified as a LCWC, and Type II error refers to a LCWC is misclassified as a CWC.

For the purpose of analyzing the predictive performance of the models, we randomly select

training data and testing data for each year from 1998 to 2012 for the two cases with 244

observations of training data and 53 observations of testing data for the U.S. dataset, and 787

Page 87: Feature selection in credit scoring- a quadratic

74

observations of training data and 113 observations of testing data for Chinese dataset. The ratio

of the number of CWCs to LCWCs for both the training and testing data is again set to 2:1. Table

19 gives a description on these two datasets.

Table 19

Description of training and testing data for the U.S. and Chinese datasets.

Country No. of

Features

Features

Property

No. of

Classes

No. of

Companies

Sample

Size

Training

Sample

Testing

Sample

U.S. 3 Numeric 2 238 297 244 53

China 3 Numeric 2 593 900 787 113

In a standard procedure, training data are used to determinate parameters for a model, and

validating data are used to test the performance of the model. Table 20 summarizes the results of

SVM, LR, DA, DT, and NN for which all the cutoff values are set as 0.5 for the two cases. The

model with SVM achieves the highest OCAR for both the U.S. and China cases.

Table 20

OCAR of five classifiers for the U.S. and Chinese datasets.

SVM Logistic LDA DT NN

U.S.

OCAR 73.58% 67.92% 66.04% 67.92% 67.92%

Type 1: 5.88% 5.88% 5.88% 11.77% 5.88%

Type 2: 57.89% 78.95% 84.21% 68.42% 78.95%

China

OCAR 71.68% 67.26% 68.14% 69.91% 67.25%

Type 1 9.59% 10.96% 6.85% 28.77% 13.70%

Type 2 62.50% 72.50% 77.50% 32.50% 67.50% The cut value is 0.5 for both the U.S. and Chinese cases

The results show that the model with SVM achieves the highest OCAR for both the U.S and

Chinese cases, and the OCARs are not significantly different from each other for the other four

models. However, Type II error is extremely high for all models, which is undesired in most of

situations in that the cost of Type II error is usually much higher than Type I error.

Page 88: Feature selection in credit scoring- a quadratic

75

5.4 ROC Curve

A weakness of the above analyses on OCAR is that the selection of cutoff value directly

affects the accuracy of the classification. In this case, fixing the cutoff value to 0.5 can be

arbitrary. Receiver Operating Characteristic (ROC) is introduced to overcome this weakness.

Given a binary classification problem in which the outcomes are labeled either as positive or

negative, there are four combination outcomes. If the actual value is positive and it is classified

as positive, then it is called a true positive; if it is classified as negative, it is called a false

negative. Conversely, if the actual value is negative and it is classified as negative, it is said to be

a true negative; if it is classified as positive, it is called a false positive. The four outcomes is

formulated in a 2×2 contingency table as follows

Positive condition Negative condition

Positive test

outcome True Positive

False Positive

(Type I error)

(1-sensitivity)

Negative test

outcome

False negative

(Type II error)

(1-Specificity)

True negative

ROC graphs are two-dimensional graphs which illustrates the performance of a binary

classifier system by plotting the true positive rate (sensitivity) against the false positive rate (1-

specificity) for the different possible cutoff value points of a diagnostic test (Fawcett, 2006). The

formulae for computing sensitivity and specificity are given below.

Positives correctly classifiedSensitivity

Total positives

Page 89: Feature selection in credit scoring- a quadratic

76

True negativeSpecificity

False positives + True negatives

The empirical method for creating an ROC plot is to plot pairs of sensitivity versus (1-

specificity) at all possible values for the decision threshold. Accuracy is measured by the area

under the ROC curve (referred as AUC). The AUC is an overall summary of diagnostic

accuracy, and the diagonal line is the ROC curve corresponding to random chance. An AUC of 1

represents a perfect test; an AUC of 0.5 represents a worthless test as ROC curve corresponds to

random chance. On rare occasions, the estimated AUC is less than 0.5, indicating that the test

does worse than chance. In other words, the closer the curve follows the left hand border and the

top border of the ROC space, the more accurate the test is (Lasko, Bhagwat, Zou & Ohno-

Machado, 2005; Zou, Resnic, Talos, Goldberg-Zimring, Bhagwat, Haker, Kikinis, Jolesz &

Ohno-Machado, 2005). Values of AUC are reported in Table 21.

Table 21 AUC of different classifiers for the U.S. and Chinese datasets.

SVM LR DA DT NN

U.S. .811

.741 .737 .599 .732

China .766 .758 .760 .741 .768

The test result variable(s): DT, NN has at least one tie between the positive

actual state group and the negative actual state group

The results indicate that the model using SVM provides highest AUC, 0.811, in the U.S.

case, and thus has the best overall summary of diagnostic accuracy. However, there is no

significant difference between the model using SVM and other models using different classifiers

for the Chinese dataset. Fig. 17 and Fig. 18 show the ROC curves for the 5 models using

different classifiers for the U.S. and Chinese cases respectively.

Page 90: Feature selection in credit scoring- a quadratic

77

Fig. 17. ROC for the U.S. dataset

Fig. 18. ROC for Chinese dataset

Page 91: Feature selection in credit scoring- a quadratic

78

Since sensitivity or true positive rate measures the proportion of positives correctly classified

whereas specificity or true negative rate measures the proportion of negatives correctly

classified, a cutoff point corresponding to a maximized sum of sensitivity and specificity gives

the highest OCAR. Lists of pairs of sensitivity and 1-specificity for each classifier are given in

Appendix G and Appendix H.

The results with new cutoff values and their corresponding OCAR are shown in Table 22. All

the OCARs for both case increase when new cutoff values are applied. In the U.S. case, when the

cutoff value changed from 0.5 to 0.3062, the OCAR increases from 73.58% to 81.13% for SVM.

Though the Type I error increases by 5.88%, the Type II error drop by 26.31%. The situation is

similar for Chinese case, when the cutoff value changed from 0.5 to 0.252, the overall

classification accuracy rate increases from 71.68% to 73.45% for SVM. The Type I error

increases by 21.92%, and Type II error drop by 45%. Models using other classifiers exhibit the

similar change. The reason we are interested in reporting Type I and Type II errors is discussed

in the next section of misclassification cost.

Table 22

OCAR in new cutoff values of five classifiers for the U.S. and Chinese datasets.

SVM Logistic LDA DT NN

U.S.

Cutoff 0.3062 0.2925 0.2819 0.4395 0.2992

OCAR 81.13% 73.58% 73.58% 67.92% 73.58%

Type 1 11.76% 23.53% 23.53% 11.77% 23.53%

Type 2 31.58% 31.58% 31.58% 68.42% 31.58%

China

Cutoff 0.2520 0.3611 0.3560 0.3761 0.3963

OCAR 73.45% 74.34% 74.34% 79.80% 75.22%

Type 1 31.51% 26.03% 24.66% 32.88% 27.40%

Type 2 17.50% 25.00% 27.50% 22.50% 20.00%

Page 92: Feature selection in credit scoring- a quadratic

79

5.5 Misclassification Cost

Though the overall classification accuracy rate is an important criterion in evaluating the

predictive performance of a credit scoring model, misclassification cost is an effective and

relatively more comprehensive way to assess a model (West, 2000). Here, we employ the

following equation 5.1 from Lee and Chen (2005) to compute the expected misclassification cost

for the five models in each case

Min (1) (2 1) (2 1) (2) (1 2) (1 2)EC P P C P P C (5.1)

where EC is the expected cost of misclassification. ( ) and ( ) are prior probabilities of

creditworthy and less creditworthy populations. ( | ) and ( | ) indicates the probability of

making Type I error and Type II error. For example, the probabilities of making Type I errors

and Type II errors for the model using SVM in the U.S. case is 0.0588 and 0.5789 as shown in

Table 20 . ( | ) and ( | ) are the corresponding cost of Type I and Type II errors. As we can

see from equation 5.1, on one hand, the cost of making Type I and Type II errors are associated

with the cost of misclassification. This is because that the cost of Type II error is usually much

higher than that of the Type I error. For example, a bank may lose the interest revenue from a

loan since it rejects the loan application from a creditworthy customer (Type I error). However, it

may experience a huge lose from a default or even fraud if the bank accept the application and

provide loan to a bad credit customer who is misclassified as a good credit customer (Type II

error). Consequently, Type II error is usually more undesired than Type I error. On the other

hand, prior probabilities of creditworthy and less creditworthy populations also affect the

misclassification cost. For instance, if ( ) is significantly greater than ( ) but the cost of

Page 93: Feature selection in credit scoring- a quadratic

80

making Type II error is not large enough than making Type I error, higher Type I error is more

undesired under this circumstance in minimizing misclassification cost.

Based on the data collected from Standard & Poor’s COMPUSTAT between 1990 and 2012,

there are 38749 companies with credit rating of B or above while 1352 companies with credit

rating below B, and thus the prior probabilities in the case of the U.S. can be set to ( )

and ( ) . According to the data in 2012, the prior probabilities in the case of Chinese

dataset is ( ) and ( ) as the number of ST companies to non-ST

companies in China is 180 to 2284 at that year.

Table 23

Misclassification cost for the U.S. and Chinese datasets. U.S. China

Model Relative

cost ratio n

Type I

error

Type II

error

EC Type I

error

Type II

error

EC

SVM 1 0.0588 0.5789 0.076483 0.0959 0.625 0.134524

5 0.0588 0.5789 0.155214 0.0959 0.625 0.317024

10

0.0588 0.5789 0.253627 0.0959 0.625 0.545149

LR 1 0.0588 0.7895 0.083644 0.1096 0.725 0.154524

5 0.0588 0.7895 0.191016 0.1096 0.725 0.366224

10

0.0588

0.7895 0.325231 0.1096 0.725 0.630849

DA 1 0.0588 0.8421 0.085432 0.0685 0.775 0.120075

5 0.0588 0.8421 0.199958 0.0685 0.775 0.346375

10

0.0588

0.8421 0.343115 0.0685 0.775 0.62925

DT 1 0.1177 0.6842 0.136961 0.2877 0.325 0.290423

5 0.1177 0.6842 0.230012 0.2877 0.325 0.385323

10

0.1177

0.6842 0.346326 0.2877 0.325 0.503948

NN 1 0.0588 0.7895 0.083644 0.137 0.675 0.176274

5 0.0588 0.7895 0.191016 0.137 0.675 0.373374

10 0.0588 0.7895 0.325231 0.137 0.675 0.619749 The cutoff values are 0.5 for both the U.S. and Chinese cases

Though valid estimates of the costs for Type I and Type II errors is a challenging task and

may not be available in this study, relative cost ratio, between them can be applied to compute

the expected misclassification costs by assuming that misclassification cost of the Type II error is

Page 94: Feature selection in credit scoring- a quadratic

81

times greater than that of the Type I error since it is generally believed that the costs

associated with Type II error are greater than the costs associated with Type I error. Here, is

set to be 1, 5 and 10 respectively. The results are summarized in Table 23.

We can see that SVM has the best performance regarding to the minimum cost of expected

misclassification criterion in comparison with those of logistic regress, discriminant analysis,

decision tree, and neural networks in all three scenarios for the U.S. dataset. In the case of

Chinese dataset, the model using SVM obtains minimum cost of expected misclassification

criterion when , while model with discriminant analysis has the best performance

following by SVM when . Decision tree has the best performance following by SVM when

. Overall, the performance of SVM is stable and slightly better than other classifiers.

However, we cannot conclude that there is a classifier that is significantly better or worse than

the others.

5.6 Identification of Cutoff Value

Finally, let’s discuss how to identify the cutoff value that gives the minimum

misclassification cost. The definitions of the Type I error, II error, sensitivity, and specificity tell

us that Type I error is same as 1-sensitivity, and Type II error equals to 1-specificity. Therefore,

to find out minimal value in equation 5.1 is same as to find out maximal of the following

objective function 5.2.

Max: (1) sensitivity (2 1) (2) specificity (1 2) (1) (2 1) (2) (1 2)P C P C P C P C (5.2)

Since ( ) ( | ) and ( ) ( | ) are constants, the objective function can be reduced

to find out the maximized value for the objective functions 5.3 below.

Page 95: Feature selection in credit scoring- a quadratic

82

Max (1) sensitivity (2 1) (2) specificity (1 2)P C P C (5.3)

Using the U.S. case with SVM classifier as an example, Table 24 reports the results

computed from objective function 5.3, showing in the SS column, and equation 5.1, showing in

column EC. The maximal of the SS column is 1.007598 while the minimal of column EC is

0.128402. They both correspond to the same cutoff value 0.3571 demonstrating that the cutoff

value for the minimized misclassification cost can be found with objective function 5.3 where

sensitivity and specificity can be derived from ROC function in SPSS directly (shown in

Appendix G and Appendix H ). Compared with the cost of misclassification (0.155214) when the

cutoff value is 0.5, the new misclassification cost is 0.128402 when the cutoff value is set to

0.3571.

In sum, if we evaluate the model based on OCAR, and would like to find out cutoff value

that gives best overall classification accuracy empirically, a list of sensitivity and 1-specificity

provided by ROC function in SPSS, listed in Appendix G and Appendix H, can be directly used

by finding out the maximal of the sum of sensitivity and specificity in the list. The corresponding

cutoff value to this maximal gives the highest overall classification accuracy rate among all the

cutoff values in the list. If we evaluate the model with misclassification cost, the cutoff value that

gives the minimum misclassification cost can be attained by substituting the values of sensitivity

and specificity from Table 24 into objective function 5.3, and the corresponding cutoff value to

this maximized objective function gives the minimum cost of misclassification in Table 24 as

shown in bold.

Page 96: Feature selection in credit scoring- a quadratic

83

Table 24

Misclassification cost with new cutoff values for the U.S. dataset. Cutoff

Value

Sensitive Specificity SS EC Cutoff

Value

Sensitive Specificity SS EC

.000 0.0000 1.0000 0.1700 0.9660 .2202 0.6765 0.7895 0.7877 0.3483

.0787 0.0294 1.0000 0.1984 0.9376 .2229 0.7059 0.7895 0.8161 0.3199

.0853 0.0588 1.0000 0.2268 0.9092 .2302 0.7353 0.7895 0.8445 0.2915

.0918 0.0882 1.0000 0.2552 0.8808 .2454 0.7353 0.7368 0.8356 0.3004

.0994 0.1176 1.0000 0.2836 0.8524 .2547 0.7647 0.7368 0.8640 0.2720

.1126 0.1471 1.0000 0.3121 0.8239 .2564 0.7647 0.6842 0.8550 0.2810

.1231 0.1765 1.0000 0.3405 0.7955 .2609 0.7941 0.6842 0.8834 0.2526

.1266 0.2059 1.0000 0.3689 0.7671 .2676 0.8235 0.6842 0.9118 0.2242

.1305 0.2353 1.0000 0.3973 0.7387 .2841 0.8529 0.6842 0.9403 0.1957

.1359 0.2647 1.0000 0.4257 0.7103 .3062 0.8824 0.6842 0.9687 0.1673

.1437 0.2941 1.0000 0.4541 0.6819 .3187 0.8824 0.6316 0.9597 0.1763

.1488 0.3235 1.0000 0.4825 0.6535 .3299 0.8824 0.5789 0.9508 0.1852

.1506 0.3529 1.0000 0.5109 0.6251 .3446 0.9118 0.5789 0.9792 0.1568

.1530 0.3824 1.0000 0.5394 0.5966 .3571 0.9412 0.5789 1.0076 0.1284

.1583 0.3824 0.9474 0.5304 0.6056 .3765 0.9412 0.5263 0.9987 0.1374

.1624 0.3824 0.8947 0.5215 0.6145 .4270 0.9412 0.4737 0.9897 0.1463

.1645 0.4118 0.8947 0.5499 0.5861 .4815 0.9412 0.4211 0.9808 0.1552

.1696 0.4118 0.8421 0.5409 0.5951 .5037 0.9412 0.3684 0.9718 0.1642

.1748 0.4412 0.8421 0.5693 0.5667 .5081 0.9412 0.3158 0.9629 0.1731

.1834 0.4706 0.8421 0.5977 0.5383 .5117 0.9412 0.2632 0.9539 0.1821

.1928 0.5000 0.8421 0.6262 0.5098 .5229 0.9412 0.2105 0.9450 0.1910

.1970 0.5294 0.8421 0.6546 0.4814 .6365 0.9412 0.1579 0.9360 0.2000

.1998 0.5588 0.8421 0.6830 0.4530 .7626 0.9412 0.1053 0.9271 0.2089

.2010 0.5882 0.8421 0.7114 0.4246 .7941 0.9412 0.0526 0.9181 0.2179

.2013 0.5882 0.7895 0.7024 0.4336 .8269 0.9706 0.0526 0.9465 0.1895

.2062 0.6176 0.7895 0.7309 0.4051 .8815 0.9706 0.0000 0.9376 0.1984

.2143 0.6471 0.7895 0.7593 0.3767 1.0000 1.0000 0.0000 0.9660 0.1700

Page 97: Feature selection in credit scoring- a quadratic

84

CHAPTER VI

CONCLUSION AND DISCUSSION

6.1 Summary

Credit risk is one of the most important topics in the risk management. Meanwhile, it is the

major risk of banks and financial institutions encountered as claimed by the Basel capital accord.

As a form of credit risk measurement, credit scoring is an important decision process used in

many business areas. A main stream of building credit scoring models is to develop classification

models so that based on the analysis of the past performance of consumers, future credit

applicants can be classified into one of the predefined classes, according to the features that

describe demographic characteristics, economic or financial conditions of the applicants

However, with the rapid growth in credit industry and facilitation of collecting and storing

information due to the new technologies, a huge amount of information on customer is available

due to increasing number of irrelevant and/or redundant features in building credit scoring

models. How to select a subset of useful features from a pool of candidate features to establish an

effective classification model in credit scoring is a practical and challenging research topic.

Feature selection is therefore essential to handle irrelevant, redundant or misleading features in

order to improve predictive accuracy and reduce high complexity, intensive computation, and

instability for most of classification models.

In this dissertation, a hybrid model is developed to improve predictive accuracy and reduce

high complexity and intensive computation when a pool of candidate features present. It

combines advantages of filter and wrapper methods, and completes feature selection and

classification prediction in two phases. In the first phase, where a filter approach is applied, a

correlation coefficient based binary quadratic programming model is constructed for selecting

Page 98: Feature selection in credit scoring- a quadratic

85

subsets of features. The model is then solved with bisection method based on Tabu search

algorithm (BMTS) and provides optional subsets of features in different sizes. In the second

phase, where a wrapper approach is employed, the selected subsets of features are evaluated in

terms of OCAR with 10-fold cross validation SVM, and finally, satisfactory subsets used to build

credit scoring model are determined based on both the OCAR and the size of the subset.

The validity of the hybrid model is demonstrated by two benchmark datasets, and

experimental results on the Australian and German datasets show the effectiveness of the

proposed BMTS+SVM method which not only performs competitively well on OCAR but also

reduces the computational effort by the classifier and provides alternative options so that a

tradeoff between accuracy and the size of subset is available, bringing flexibility to the decision

making process.

This validated method is then used in an international business context to test the data on the

U.S. and Chinese companies in order to identify key factors in discriminating between CWCs

and LCWCs in these two countries. The most useful financial ratios and their corresponding

financial categories are first identified for the U.S. companies. The four categories are those with

profitability, solvency, cash flow, and leverage ratios, and are consistent with the four financial

categories provided by a widely recognized credit rating agency Standard & Poor. Similarly, we

found the same four financial categories for Chinese companies with an additional category with

operating ratios. Therefore, managers should be aware that key financial categories may vary for

different countries. Moreover, the application of the findings is twofold. On one hand, managers

of financial institutions can pay more attention to the ratios in the key financial categories

especially the most representative ratios selected with our proposed method so that they are able

to gain better understanding about the credit status of their applicants before making any further

Page 99: Feature selection in credit scoring- a quadratic

86

decisions. On the other hand, companies that attempt to borrow money from financial institutions

are able to attain clear vision on what are the most important financial factors for being

considered a creditworthy company, and what improvement are needed immediately to increase

the chance of receiving loans.

The performance of classification models (models using different classifiers) in terms of

OCAR and misclassification cost is evaluated based on the U.S. and Chinese datasets. Cutoff

values which gives highest OCAR and lowest misclassification cost is also discussed. The results

show that SVM has stable and slightly better overall performance. However, there is no strong

evidence showing that a particular classifier significantly outperforms the others.

6.2 Discussion and Future Research

For the proposed method per se, the computational effectiveness can be improved if critical

points of are available. Evidently, the time for finding out different sizes of subsets in phase

one depends on algorithms solving the quadratic programming model and to what extent is

partitioned. While Tabu search algorithms are efficient and among the most successful ones in

solving problems of large size, our future study in improving computational time and effort

based on this study lies on how efficient [ ] can be divided and identified for different

sizes. For example, if we divided into where and set the time limit to 1 second for

Tabu search algorithm, then 1024 seconds is needed to reach the solutions for the UBQP

problem. From the experiment of the two datasets, we know that a certain range of gives the

same solution, which means many of solutions are duplicated. However, if the critical point of

for different sizes can be identified, duplicate solutions will be avoided, thus saving a lot of

computational time.

Page 100: Feature selection in credit scoring- a quadratic

87

In addition, the BMTS method can be extent to meet the requirement if a particular number

of features is specified to be selected from the candidate features. We can set to a number of

different values between 0 and 1 at first step. For example, if subsets with size 5 from 40

candidate features are needed, we can set to 0.1, 0.2, 0.3, 0.4, 0.5, etc., and if the solution of

model 3.1 corresponding to gives a subset of size 3 while gives a subset of size

6. We will know that by adjusting between 0.3 and 0.4, the subsets of size 5 can be identified.

The method will be also tested to deal with real big data in the future. This improvement can

be done in twofold. On one hand, with the increasing number of candidate features, the subsets

of features identified by BMTS method for each size increases as well. To cope with the

increasing computational effort causing by the increasing number of subsets is a challenge task

in the future. On the other hand, SVM is used as the classifier in this study due to its strong

theoretical foundation, adaptive generalization ability, and appealing and stable predictive

performance. However, according to the results from comparing different classifiers in Chapter

5, there is no strong evidence showing that a particular classifier significantly outperforms the

others. Therefore, we can combine different classifiers with BMTS method in different cases.

For example, in big data with extremely large size of samples, SVM might not be the best choice

of classifier since a disadvantage of SVM is that it has high algorithmic complexity and

extensive memory requirements in large scale tasks (Yu, Miche, Sorjamaa, Guillen, Lendasse &

Severin, 2010).

What’s more, the BMTS+SVM method has been so far tested in scenarios that only two

classes presented good credit and bad credit or creditworthy companies and less creditworthy

companies. However, real world credit scoring problems often involve more groups. For

example, in the study of credit scoring at corporate level, companies in the U.S. dataset are

Page 101: Feature selection in credit scoring- a quadratic

88

classified into AAA, AA, A, BBB, C, and so on. Therefore, another improvement that can be

made in the future research is to test the performance of the proposed hybrid model for credit

scoring in a situation when three or more classes or groups presented.

Finally, in the study of the U.S. and Chinese cases, the features used to predict the

classification are all financial ratios. In the future study, we can include more features other than

financial ratios such as main activity of the business, the borrower’s business expertise and the

status of the borrower’s economic sector and its position within that sector, age of business, the

borrower’s sensitivity to economic and market developments, business location, and even the

structure of a company’s board members. Also, we can include macroeconomic features as well

as industrial level features.

Page 102: Feature selection in credit scoring- a quadratic

89

REFERENCES

Abdou, H. A. (2009). Genetic programming for credit scoring: The case of Egyptian public

sector banks. Expert Systems with Applications, 36(9), 11402-11417.

Abdou, H. A., & Pointon, J. (2011). Credit scoring, statistical techiques and evaluation criteria:

A review of the literatue. Intelligent Systems in Accounting, Finance & Management,

18(2/3), 59-88.

Aidi, M. N., & Sari, R. I. (2012). Classification of debtor credit status and determination amount

of credit risk by using linier discriminant function. Paper presented at the AIP Conference

Proceedings.

Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer

diagnosis. Expert Systems with Applications, 36(2-2), 3240-3247.

Akkoç, S. (2012). An empirical comparison of conventional techniques, neural networks and the

three stage hybrid adaptive neuro fuzzy inference system (ANFIS) model for credit

scoring analysis: The case of Turkish credit card data. European Journal of Operational

Research, 222(1), 168-178.

Alam, P., Booth, D., Lee, K., & Thordarson, T. (2000). The use of fuzzy clustering algorithm

and self-organizing neural networks for identifying potentially failing banks: an

experimental study. Expert Systems with Applications, 18(3), 185-199.

Alfaro-Cid, E., Sharman, K., & Esparcia-Alcazar, A. I. (2007). A genetic programming apprach

for bankruptcy prediction using a highly unbalanced database. In M. Giacobini, A.

Brabazon, S. Cagnoni, G. A. Di Caro, R. Drechsler, M. Farooq, A. Fink, E. Lutton, P.

Machado, S. Minner, M. O’ Neill, J. Romero, F. Rothlauf, G. Squillero, H. Takagi, A. S.

Uyar & S. Yang (Eds.), Applications of Evolutionary Computing, EvoWorkshops2007:

Page 103: Feature selection in credit scoring- a quadratic

90

EvoCOMNET, EvoFIN, EvoIASP, EvoInteraction, EvoMUSART, EvoSTOC,

EvoTransLog (pp. 169-178). Valencia, Spain: Springer Verlag.

Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate

bankruptcy. The Journal of Finance, 23(4), 589-609.

Asada, T., Yun, Y., Nakayama, H., & Tanino, T. (2004). Pattern classification by goal

programming and support vector machines. Computational Management Science, 1(3-4),

211-230.

Baesens, B., Gestel, T. V., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003).

Benchmarking state-of-the-art classification algorithms for credit scoring. The Journal of

the Operational Research Society, 54(6), 627-635.

Battiti, R. (1994). Using mutual information for selecting features in supervised neural net

learning. IEEE Transactions on Neural Networks, 5(4), 537-550.

Bell, T. B. (1997). Neural nets or the logit model? A comparison of each model’s ability to

predict commercial bank failures. Intelligent Systems in Accounting, Finance &

Management, 6(3), 249-264.

Bellotti, T., & Crook, J. (2009). Support vector machines for credit scoring and discovery of

significant features. Expert Systems with Applications, 36(2-2), 3302-3308.

Boros, E., Hammer, P. L., Sun, R., & Tavares, G. (2008). A max-flow approach to improved

lower bounds for quadratic unconstrained binary optimization (QUBO). Discrete

Optimization, 5(2), 501-529.

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression

trees. Belmont, CA: Chapman and Hall.

Page 104: Feature selection in credit scoring- a quadratic

91

Brill, J. (1998). The importance of credit scoring models in improving cash flow and collections.

Business Credit, 100(1), 16-17.

Camastra, F. (2007). A SVM-based cursive character recognizer. Pattern Recognition, 40(12),

3721-3727.

Camps, V. G., Mooij, J., & Scholkopf, B. (2010). Remote sensing feature selection by kernel

dependence measures. IEEE Geoscience and Remote Sensing Letters, 7(3), 587-591.

Chen, F. L., & Li, F. C. (2010). Combination of feature selection approaches with SVM in credit

scoring. Expert Systems with Applications, 37(7), 4902-4909.

Chen, W., Ma, C., & Ma, L. (2009). Mining the customer credit using hybrid support vector

machine technique. Expert Systems with Applications, 36(4), 7611-7616.

Chen, Q., Zhang, D., Wei, L., & Chen, H. (2007, March 1-April 5). A modified genetic

programming for behavior scoring problem. Paper presented at the IEEE Symposium on

Computational Intelligence and Data Mining. doi: 10.1109/CIDM.2007.368921

Chiang, L. H., & Pell, R. J. (2004). Genetic algorithms combined with discriminant analysis for

key variable identification. Journal of Process Control, 14(2), 143-155.

Cho, S., Hong, H., & Ha, B. C. (2010). A hybrid approach based on the combination of variable

selection using decision trees and case-based reasoning using the Mahalanobis distance:

For bankruptcy prediction. Expert Systems with Applications, 37(4), 3482-3488.

Coakley, J. R., & Brown, C. E. (2000). Artificial neural networks in accounting and finance:

Modeling issues. International Journal of Intelligent Systems in Accounting Finance &

Management, 9(2), 119-144.

Crook, J. N., Edelman, D. B., & Thomas, L. C. (2007). Recent developments in consumer credit

risk assessment. European Journal of Operational Research, 183(3), 1447-1465.

Page 105: Feature selection in credit scoring- a quadratic

92

Crouhy, M., Galai, D., & Mark, R. (2000). A comparative analysis of current credit risk models.

Journal of Banking & Finance, 24(1–2), 59-117.

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of

Control, Signals and Systems, 2(4), 303-314.

Danenas, P., Garsva, G., & Gudas, S. (2011). Credit risk evaluation model development using

support vector based classifiers. Procedia Computer Science, 4, 1699-1707.

Danenas, Garsva, G., & Simutis, R. (2011). Development of discriminant analysis and majority-

voting based credit risk assessment classifier. International Conference on

Computational Science. Retrieved from http://world-comp.org/p2011/ICA3513.pdf

Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence,

151(1–2), 155-176.

Derelioğlu, G., & Gürgen, F. (2011). Knowledge discovery using neural approach for SME's

credit risk analysis problem in Turkey. Expert Systems with Applications, 38(8), 9313-

9318.

Derelioğlu, G., Gürgen, F., & Okay, N. (2009). A Neural Approach for SME’s Credit Risk

Analysis in Turkey. In P. Perner (Ed.), Machine Learning and Data Mining in Pattern

Recognition (pp. 749-759). Berlin, Heidelberg: Springer.

Desai, V. S., Crook, J. N., & Overstreet, G. A., Jr. (1996). A comparison of neural networks and

linear scoring models in the credit union environment. European Journal of Operational

Research, 95(1), 24.

Dimla, D. E., Sr., & Lister, P. M. (2000). On-line metal cutting tool condition monitoring.: II:

tool-state classification using multi-layer perceptron neural networks. International

Journal of Machine Tools and Manufacture, 40(5), 769-781.

Page 106: Feature selection in credit scoring- a quadratic

93

Eiben, A. E., & Smith, J. E. (2003). Introduction to Evolutionary Computing. Berlin Heidelberg:

Springer.

Eisenbeis, R. A. (1978). Problems in applying discriminant analysis in credit scoring models.

Journal of Banking & Finance, 2(3), 205-219.

Eksioglu, B., Demirer, R., & Capar, I. (2005). Subset selection in multiple linear regression: a

new mathematical programming approach. Computers & Industrial Engineering, 49(1),

155-167.

Espejo, P. G., Ventura, S., & Herrera, F. (2010). A survey on the application of genetic

programming to classification. IEEE Transactions on Systems, Man and Cybernetics Part

C: Applications and Reviews, 40(2), 121-144.

Etemadi, H., Anvary Rostamy, A. A., & Dehkordi, H. F. (2009). A genetic programming model

for bankruptcy prediction: Empirical evidence from Iran. Expert Systems with

Applications, 36(2-2), 3199-3207.

Falangis, K. (2007). The use of MSD model in credit scoring. Operational Research, 7(3), 481-

503.

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861-

874.

Fedorova, E., Gilenko, E., & Dovzhenko, S. (2013). Bankruptcy prediction for Russian

companies: Application of combined classifiers. Expert Systems with Applications,

40(18), 7285-7293.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of

Eugenics, 7(2), 179-188.

Page 107: Feature selection in credit scoring- a quadratic

94

Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of

Machine Learning Research, 5, 1531-1555.

Frydman, H., Altman, E. I., & Kao, D.-L. (1985). Introducing recursive partitioning for financial

classification: The case of financial distress. The Journal of Finance, 40(1), 269-291.

García, V., Marqués, A. I., & Sánchez, J. S. (2012). Improving risk predictions by preprocessing

imbalanced credit data. Neural Information Processing, 7664, 68-75.

Glen, J. J. (2003). An iterative mixed integer programming method for classification accuracy

maximizing discriminant analysis. Computers and Operations Research, 30(2), 181-198.

Glover, F. (1986). Future paths for integer programming and links to artificial intelligence.

Computers & Operations Research, 13(5), 533.

Glover, F. (1989). Tabu search-- Part I. ORSA Journal on Computing, 1(3), 190-206.

Glover, F. (1990). Tabu search-- Part II. ORSA Journal on Computing, 2(1), 4-32.

Glover, F., Kochenberger, G.A., & Alidaee, B. (1998). Adaptive memory Tabu search for binary

quadratic programs. Management Science, 44(3), 336-345.

Glover, F., & Laguna, M. (1997). Tabu search: Kluwer Academic.

Glover, F., Lü, Z., & Hao, J.-K. (2010). Diversification-driven Tabu search for unconstrained

binary quadratic problems. 4OR, 8(3), 239-253.

Gönen, B. G., Gönen, M., & Gürgen, F. (2012). Probabilistic and discriminative group-wise

feature selection methods for credit risk analysis. Expert Systems with Applications,

39(14), 11709-11717.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of

Machine Learning Research, 3(7), 1157-1182.

Page 108: Feature selection in credit scoring- a quadratic

95

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification

using support vector machines. Machine Learning, 46(1-3), 389-422.

Hair, J. F. H., Black, W. C. B., Babin, B. J. B., & Alderson, R. E. (2010). Multivariate data

analysis (7th ed.). Upper Saddle River, NJ: Prentice Hall.

Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit

scoring: A review. Journal of the Royal Statistical Society: Series A (Statistics in Society),

160(3), 523-541.

Harikrishna, S., Farquad, M. A. H., & Shabana. (2012). Credit scoring using support vector

machine: A comparative analysis. Advanced Materials Research, 433/440, 6527-6533.

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are

universal approximators. Neural Networks, 2(5), 359-366.

Hsieh, N. C., & Hung, L. P. (2010). A data driven ensemble classifier for credit scoring analysis.

Expert Systems with Applications, 37(1), 534-545.

Hsu, W. H. (2004). Genetic wrappers for feature selection in decision tree induction and variable

ordering in Bayesian network structure learning. Information Sciences, 163(1–3), 103-

122.

Huang, C.-L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach

based on support vector machines. Expert Systems with Applications, 33(4), 847-856.

Huang, C.-L., Liao, H.-C., & Chen, M.-C. (2008). Prediction model building and feature

selection with support vector machines in breast cancer diagnosis. Expert Systems with

Applications, 34(1), 578-587.

Huang, J.-J., Tzeng, G.-H., & Ong, C.-S. (2006). Two-stage genetic programming (2SGP) for the

credit scoring model. Applied Mathematics and Computation, 174(2), 1039-1053.

Page 109: Feature selection in credit scoring- a quadratic

96

Jiang, M., & Yuan, X. (2007, August 24-27). Personal credit scoring model of non-linear

combining forecast based on GP. Paper presented at the International Conference on

Natural Computation. doi: 10.1109/ICNC.2007.551

Jo, H., Han, I., & Lee, H. (1997). Bankruptcy prediction using case-based reasoning, neural

networks, and discriminant analysis. Expert Systems with Applications, 13(2), 97-108.

Karels, G. V., & Prakash, A. J. (1987). Multivariate normality and forecasting of business

bankruptcy. Journal of Business Finance & Accounting, 14(4), 573-593.

Kim, H., & Sohn, S. (2010). Support vector machines for default prediction of SMEs based on

technology credit. European Journal of Operational Research, 201(3), 838-846.

Kim, Y. S., & Sohn, S. Y. (2004). Managing loan customers using misclassification patterns of

credit scoring model. Expert Systems with Applications, 26(4), 567-573.

Kira, K., & Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new

algorithm. Proceedings of the National Conference on Artificial Intelligence, San Jose,

1992. Menlo Park, CA: The AAAI Press.

Kochenberger, G., & Glover, F. (2006). A Unified Framework for Modeling and Solving

Combinatorial Optimization Problems: A Tutorial. In W. Hager, S.-J. Huang, P. Pardalos

& O. Prokopyev (Eds.), Multiscale Optimization Methods and Applications (pp. 101-124).

New York, NY: Springer.

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model

selection. Proceedings of the International Joint Conference on Artificial Intelligence,

Montreal, 1995. San Francisco, CA: Morgan Kaufmann Publishers Inc.

Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence,

97(1–2), 273-324.

Page 110: Feature selection in credit scoring- a quadratic

97

Koza, J. R. (1992). Genetic programming: On the programming of computers by means of

natural selection. Cambridge, MA: MIT Press.

Kumar, P. R., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and

intelligent techniques: A review. European Journal of Operational Research, 180(1), 1-

28.

Kumari, B., & Swarnkar, T. (2011). Filter versus wrapper feature subset selection in large

dimensionality micro array: A review. International Journal of Computer Science and

Information Technologies, 2(3), 1048-1053.

Kwak, N., & Choi, C.-H. (2002). Input feature selection by mutual information based on Parzen

window. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1667-

1671.

Lacher, R. C., Coats, P. K., Sharma, S. C., & Fant, L. F. (1995). A neural network for classifying

the financial health of a firm. European Journal of Operational Research, 85(1), 53-65.

Lam, K. F., & Moy, J. W. (2002). Combining discriminant methods in solving classification

problems in two-group discriminant analysis. European Journal of Operational Research,

138(2), 294-301.

Lasko, T. A., Bhagwat, J. G., Zou, K. H., & Ohno-Machado, L. (2005). The use of receiver

operating characteristic curves in biomedical informatics. Journal of Biomedical

Informatics, 38(5), 404-415.

Lawson, J. C. (1995). Knowing the score. U.S. Banker, 105(11), 61.

Lee, T. S., & Chen, I. F. (2005). A two-stage hybrid credit scoring model using artificial neural

networks and multivariate adaptive regression splines. Expert Systems with Applications,

28(4), 743-752.

Page 111: Feature selection in credit scoring- a quadratic

98

Lee, T. S., Chiu, C.-C., Lu, C.-J., & Chen, I. F. (2002). Credit scoring using the hybrid neural

discriminant technique. Expert Systems with Applications, 23(3), 245-254.

Lee, T. H., & Jung, S.-C. (1999). Forecasting creditworthiness: Logistic vs. artificial neural net.

Journal of Business Forecasting Methods & Systems, 18(4), 28.

Lensberg, T., Eilifsen, A., & McKee, T. E. (2006). Bankruptcy theory development and

classification via genetic programming. European Journal of Operational Research,

169(2), 677-697.

Leshno, M., & Spector, Y. (1996). Neural network prediction analysis: The bankruptcy case.

Neurocomputing, 10(2), 125-147.

Lessmann, S., & Voß, S. (2009). A reference model for customer-centric data mining with

support vector machines. European Journal of Operational Research, 199(2), 520-530.

Li, C. H., Li, Y. C., Kuo, B. C., Liu, J. F., & Huang, H. Y. (2012). SVM self-contained variable

importance measure for credit scoring. ICIC Express Letters, 6(2), 389-394.

Li, R.-H., & Belford, G. G. (2002). Instability of decision tree classification algorithms.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery

and Data Mining, Edmonton, 2002. New York, NY: ACM.

Lin, C. C., Chang, C. C., Li, F. C., & Chao, T. C. (2011, December 6-9). Features selection

approaches combined with effective classifiers in credit scoring. Paper presented at the

IEEE International Conference on Industrial Engineering and Engineering Management.

doi: 10.1109/IEEM.2011.6118017

Liu, S., Wang, Q., & Shuai, L. (2008, July 2-4). Application of Genetic Programming in credit

scoring. Paper presented at the Control and Decision Conference. doi:

10.1109/CCDC.2008.4597485

Page 112: Feature selection in credit scoring- a quadratic

99

Liu, Y., & Schumann, M. (2005). Data mining feature selection for credit scoring models.

Journal of the Operational Research Society, 56(9), 1099-1108.

Loh, W.-Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data

Mining and Knowledge Discovery, 1(1), 14-23.

Lu, C., Van Gestel, T., Suykens, J. A. K., Van Huffel, S., Vergote, I., & Timmerman, D. (2003).

Preoperative prediction of malignancy of ovarian tumors using least squares support

vector machines. Artificial Intelligence in Medicine, 28(3), 281-306.

Lü, C., & Zhao, Y. (2004). Researches on the financial position classification of listed companies.

Accounting Research, 11, 53-61 (in Chinese).

Lü, Z., Glover, F., & Hao, J.-K. (2010). A hybrid metaheuristic approach to solving the UBQP

problem. European Journal of Operational Research, 207(3), 1254.

Malhotra, R., & Malhotra, D. K. (2003). Evaluating consumer loans using neural networks.

Omega, 31(2), 83-96.

Mandala, I. G. N. N., Nawangpalupi, C. B., & Praktikto, F. R. (2012). Assessing credit risk: An

application of data mining in a rural bank. Procedia Economics and Finance, 4, 406-412.

Marinakis, Y., Marinaki, M., Doumpos, M., & Zopounidis, C. (2009). Ant colony and particle

swarm optimization for financial classification problems. Expert Systems with

Applications, 36(7), 10604-10611.

Martens, D., Baesens, B., Van Gestel, T., & Vanthienen, J. (2007). Comprehensible credit

scoring models using rule extraction from support vector machines. European Journal of

Operational Research, 183(3), 1466-1476.

Martin, D. (1977). Early warning of bank failure: A logit regression approach. Journal of

Banking & Finance, 1(3), 249-276.

Page 113: Feature selection in credit scoring- a quadratic

100

McKee, T. E., & Lensberg, T. (2002). Genetic programming and rough sets: A hybrid approach

to bankruptcy classification. European Journal of Operational Research, 138(2), 436-451.

Merton, R. C. (1974). On the pricing of corporate debt: The risk structure of interest rates. The

Journal of Finance, 29(2), 449-470.

Miller, A. J. (1984). Selection of subsets of regression variables. Journal of the Royal Statistical

Society: Series A, 147(3), 389-425.

Min, J. H., & Jeong, C. (2009). A binary classification method for bankruptcy prediction. Expert

Systems with Applications, 36(3-1), 5256-5263.

Min, J. H., & Lee, Y.-C. (2008). A practical approach to credit scoring. Expert Systems with

Applications, 35(4), 1762-1770.

Nanni, L., & Lumini, A. (2009). An experimental comparison of ensemble of classifiers for

bankruptcy prediction and credit scoring. Expert Systems with Applications, 36(2-2),

3028-3033.

Nie, G., Rowe, W., Zhang, L., Tian, Y., & Shi, Y. (2011). Credit card churn forecasting by

logistic regression and decision tree. Expert Systems with Applications, 38(12), 15273-

15285.

Njanike, K. (2009). The impact of effective credit risk management on bank survuval. Annals of

the University of Petrosani Economics, 9(2), 173-184.

Olson, D. L., Delen, D., & Meng, Y. (2012). Comparative analysis of data mining methods for

bankruptcy prediction. Decision Support Systems, 52(2), 464-473.

Ong, C.-S., Huang, J.-J., & Tzeng, G.-H. (2005). Building credit scoring models using genetic

programming. Expert Systems with Applications, 29(1), 41-47.

Page 114: Feature selection in credit scoring- a quadratic

101

Oreski, S., Oreski, D., & Oreski, G. (2012). Hybrid system with genetic algorithm and artificial

neural networks and its application to retail credit risk assessment. Expert Systems with

Applications, 39(16), 12605-12617.

Paisittanand, S., & Olson, D. L. (2006). A simulation study of IT outsourcing in the credit card

business. European Journal of Operational Research, 175(2), 1248-1261.

Paleologo, G., Elisseeff, A., & Antonini, G. (2010). Subagging for credit scoring models.

European Journal of Operational Research, 201(2), 490-499.

Palubeckis, G. (2004). Multistart Tabu search strategies for the unconstrained binary quadratic

optimization problem. Annals of Operations Research, 131(1-4), 259-282.

Palubeckis, G. (2006). Iterated Tabu search for the unconstrained binary quadratic optimization

problem. Informatica, 17(2), 279-296.

Peng, H., Fulmi, L., & Ding, C. (2005). Feature selection based on mutual information criteria of

max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 27(8), 1226-1238.

Piramuthu, S. (1999). Financial credit-risk evaluation with neural and neurofuzzy systems.

European Journal of Operational Research, 112(2), 310-321.

Prajapati, G. L., & Patle, A. (2010, November 19-21). On performing classification using SVM

with radial basis and polynomial kernel functions. Paper presented at the International

Conference on Emerging Trends in Engineering and Technology. doi:

10.1109/ICETET.2010.134

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.

Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA: Morgan

Kaufmann Publishers Inc.

Page 115: Feature selection in credit scoring- a quadratic

102

Rampone, S., Frattolillo, F., & Landolfi, F. (2013). Assessing consumer credit applications by a

genetic programming approach Advanced Dynamic Modeling of Economic and Social

Systems (pp. 79-89). Berlin, Heidelberg: Springer.

Ravi, V., & Pramodh, C. (2008). Threshold accepting trained principal component neural

network and feature subset selection: Application to bankruptcy prediction in banks.

Applied Soft Computing, 8(4), 1539-1548.

Ryu, Y. U., & Yue, W. T. (2005). Firm bankruptcy prediction: Experimental comparison of

isotonic separation and other classification approaches. IEEE Transactions on Systems,

Man and Cybernetics, 35(5), 727-737.

Sakar, O. C., & Kursun, O. (2012). A method for combining mutual information and canonical

correlation analysis: Predictive Mutual Information and its use in feature selection.

Expert Systems with Applications, 39(3), 3333-3344.

Salehi, M., & Mansoury, A. (2011). An evaluation of Iranian banking system credit risk: Neural

network and logistic regression approach. International Journal of Physical Sciences,

6(25), 6082-6090.

Schebesch, K. B., & Stecking, R. (2005). Support vector machines for classifying and describing

credit applicants: Detecting typical and critical regions. The Journal of the Operational

Research Society, 56(9), 1082-1088.

Senliol, B., Gulgezen, G., Lei, Y., & Cataltepe, Z. (2008, October 27-29). Fast Correlation

Based Filter (FCBF) with a different search strategy. Paper presented at the International

Symposium onComputer and Information Sciences. doi: 10.1109/ISCIS.2008.4717949

Sette, S., & Boullart, L. (2001). Genetic programming: Principles and applications. Engineering

Applications of Artificial Intelligence, 14(6), 727-736.

Page 116: Feature selection in credit scoring- a quadratic

103

Shin, K.-S., Lee, T. S., & Kim, H.-j. (2005). An application of support vector machines in

bankruptcy prediction model. Expert Systems with Applications, 28(1), 127-135.

Srinivasan, V., & Kim, Y. H. (1987). Credit granting: A comparative analysis of classification

procedures. The Journal of Finance, 42(3), 665-681.

Stephanou, C., & Mendoza, J. C. (2005). Credit risk measurement under Basel II: An overview

and implementation issues for developing countries World Bank Policy Research

Working Paper 3556.

Su, C.-T., & Yang, C.-H. (2008). Feature selection for the SVM: An application to hypertension

diagnosis. Expert Systems with Applications, 34(1), 754-763.

Šušteršič, M., Mramor, D., & Zupan, J. (2009). Consumer credit scoring models with limited

data. Expert Systems with Applications, 36(3), 4736-4744.

Swicegood, P., & Clark, J. A. (2001). Off-site monitoring systems for predicting bank

underperformance: A comparison of neural networks, discriminant analysis, and

professional human judgment. Intelligent Systems in Accounting, Finance &

Management, 10(3), 169-186.

Tay, F. E. H., & Cao, L. (2001). Application of support vector machines in financial time series

forecasting. Omega, 29(4), 309-317.

Tsai, C. F. (2009). Feature selection in bankruptcy prediction. Knowledge-Based Systems, 22(2),

120-127.

Utzig, S. (2010). The financial crisis and the regulation of credit rating agencies: A European

banking perspective ADBI Working Paper Series (Vol. 188): Asian Development Bank

Institute.

Page 117: Feature selection in credit scoring- a quadratic

104

Van Gestel, T., Suykens, J. A. K., Baestaens, D. E., Lambrechts, A., Lanckriet, G., Vandaele, B.,

De Moor, B., & Vandewalle, J. (2001). Financial time series prediction using least

squares support vector machines within the evidence framework. IEEE Transactions on

Neural Networks, 12(4), 809-821.

Vapnik, V. (1995). The nature of statistical learning theory. New York, NY: Springer.

Wang, C. M., & Huang, Y. (2009). Evolutionary-based feature selection approaches with new

criteria for data mining: A case study of credit approval data. Expert Systems with

Applications, 36(3-2), 5900-5908.

Wang, J., Guo, K., & Wang, S. (2010). Rough set and Tabu search based feature selection for

credit scoring. Procedia Computer Science, 1(1), 2425-2432.

Wang, J., Hedar, A. R., Wang, S., & Ma, J. (2012). Rough set and scatter search metaheuristic

based feature selection for credit scoring. Expert Systems with Applications, 39(6), 6123-

6128.

Wang, Y., Lü, Z., Glover, F., & Hao, J.-K. (2012). Path relinking for unconstrained binary

quadratic programming. European Journal of Operational Research, 223(3), 595-604.

Wei, H., & Billings, S.A. (2007). Feature subset selection and ranking for data dimensionality

reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 162-

166.

Wei, L., Li, J., & Chen, Z. (2007, May 27-30). Credit risk evaluation using support vector

machine with mixture of kernel. Paper presented at the International Conference on

Computational Science. doi: 10.1007/978-3-540-72586-2_62

West, D. (2000). Neural network credit scoring models. Computers & Operations Research,

27(11–12), 1131-1152.

Page 118: Feature selection in credit scoring- a quadratic

105

Wlodzislaw, D., & Norbert, J. (2001, April 25-27). Transfer functions: hidden possibilities for

better neural networks. Paper presented at the European Symposium on Artificial Neural

Networks. Bruges: De-facto publications.

Wollan, R. (2008). The new rules for customer service: Findings from the Accenture Global

Customer Satisfaction Survey. Accenture Outlook. Retrieved from

http://www.accenture.com/sitecollectiondocuments/pdf/Global20Customer20Satisfaction

20Survey_Outlook_Jan08.pdf

World Bank (2013). Banking crisis. Global Financial Development Report. Retrieved from

http://econ.worldbank.org/WBSITE/EXTERNAL/EXTDEC/EXTGLOBALFINREPORT

/0,,contentMDK:23268770~pagePK:64168182~piPK:64168060~theSitePK:8816097,00.

html

Xiong, Z. (2013). Research on credit evaluation model based on nonlinear principal component

analysis. The Journal of Quantitative & Technical Economics, 30(10), 138-150 (in

Chinese).

Yang, J., & Li, Y.-P. (2006). Orthogonal relief algorithm for feature selection. In D.-S. Huang, K.

Li & G. Irwin (Eds.), Intelligent Computing (pp. 227-234). Berlin, Heidelberg: Springer

Yap, B. W., Ong, S. H., & Husain, N. H. M. (2011). Using data mining to improve assessment of

credit worthiness via credit scoring models. Expert Systems with Applications, 38(10),

13274-13283.

Ye, H., Li, N., Feng, H., & Wang, Y. (2011). The comparisons of personal credit evaluation

models. Information Technology Journal, 10(11), 2237-2241.

Yi, J., Yan, C., Zhimin, Z., & He, X. (2008, July 8-11). A bank customer credit evaluation based

on the decision tree and the simulated annealing algorithm. Paper presented at the IEEE

Page 119: Feature selection in credit scoring- a quadratic

106

International Conference on Computer and Information Technology. doi:

10.1109/CIT.2008.4594674

Yim, J., & Mitchell, H. (2005). Comparison of country risk models: hybrid neural networks,

logit models, discriminant analysis and cluster techniques. Expert Systems with

Applications, 28(1), 137-148.

Yu, L., & Liu, H. (2004). Efficient Feature Selection via Analysis of Relevance and Redundancy.

Journal of Machine Learning Research, 5, 1205-1224.

Yu, L., Wang, S., & Lai, K. K. (2007). Foreign-Exchange-Rate Forecasting With Artificial

Neural Networks. New York, NY: Springer.

Yu, Q., Miche, Y., Sorjamaa, A., Guillen, A., Lendasse, A., & Severin, E. (2010). OP-KNN:

method and applications. Advances in Artificial Neural Systems, 2010, 1-6.

Zhang, D., Hifi, M., Chen, Q., & Ye, W. (2008, October 18-20). A hybrid credit scoring model

based on genetic programming and support vector machines. Paper presented at the

International Conference on Natural Computation. doi: 10.1109/ICNC.2008.205

Zhang, D., Zhou, X., Leung, S. C. H., & Zheng, J. (2010). Vertical bagging decision trees model

for credit scoring. Expert Systems with Applications, 37(12), 7838-7843.

Zhang, G., Hu, Y. M., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural networks in

bankruptcy prediction: General framework and cross-validation analysis. European

Journal of Operational Research, 116(1), 16-32.

Zhang, Y., & Bhattacharyya, S. (2004). Genetic programming in classifying large-scale data: an

ensemble method. Information Sciences, 163(1–3), 85-101.

Page 120: Feature selection in credit scoring- a quadratic

107

Zibanezhad, E., Foroghi, D., & Monadjemi, A. (2011, June 10-12). Applying decision tree to

predict bankruptcy. Paper presented at the IEEE International Conference on Computer

Science and Automation Engineering. doi: 10.1109/CSAE.2011.5952826

Zimmermann, H. J., & Zysno, P. (1983). Decisions and evaluations by hierarchical aggregation

of information. Fuzzy Sets and Systems, 10(1-3), 243-260.

Zou, K. H., Resnic, F. S., Talos, I.-F., Goldberg-Zimring, D., Bhagwat, J. G ., Haker, S. J.,

Kikinis, R., Jolesz, F. A., & Ohno-Machado, L. (2005). A global goodness-of-fit test for

receiver operating characteristic curve analysis via the bootstrap method. Journal of

Biomedical Informatics, 38(5), 395-403.

Page 121: Feature selection in credit scoring- a quadratic

108

APPENDIX A

EXAMPLE OF SOLUTIONS FOR MODEL 3.1

When is divided into alpha:0.0

best_result = 0

best_t = 0.01

Best Solution is :

0 1 0 0 0 0 0 0 0 0 0 0 0 0

***************************************************

alpha:0.001953125

best_result = 141

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

alpha:0.00390625

best_result = 281

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

alpha:0.005859375

best_result = 422

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

alpha:0.0078125

best_result = 563

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

alpha:0.060546875

best_result = 4362

best_t = 0.00

Best Solution is :

1 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

Page 122: Feature selection in credit scoring- a quadratic

109

alpha:0.0625

best_result = 4506

best_t = 0.00

Best Solution is :

1 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

alpha:0.064453125

best_result = 4649

best_t = 0.00

Best Solution is :

1 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

alpha:0.06640625

best_result = 4792

best_t = 0.00

Best Solution is :

1 0 0 0 0 0 0 1 0 0 0 0 0 0

***************************************************

alpha:0.3046875

best_result = 22332

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 1 0 0

***************************************************

alpha:0.306640625

best_result = 22503

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 1 0 0

***************************************************

alpha:0.30859375

best_result = 22674

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 1 0 0

***************************************************

alpha:0.310546875

best_result = 22847

best_t = 0.00

Best Solution is :

0 0 0 0 0 0 0 1 0 0 0 1 0 0

***************************************************

Page 123: Feature selection in credit scoring- a quadratic

110

alpha:0.59375

best_result = 58258

best_t = 0.00

Best Solution is :

1 0 0 0 1 0 0 1 0 0 0 0 0 1

***************************************************

alpha:0.595703125

best_result = 58591

best_t = 0.00

Best Solution is :

1 0 0 0 1 0 0 1 0 0 0 0 0 1

***************************************************

alpha:0.59765625

best_result = 59015

best_t = 0.00

Best Solution is :

1 0 0 0 1 0 0 1 0 1 0 0 0 1

***************************************************

alpha:0.599609375

best_result = 59549

best_t = 0.00

Best Solution is :

1 0 0 0 1 0 0 1 0 1 0 0 0 1

***************************************************

alpha:0.970703125

best_result = 311272

best_t = 0.00

Best Solution is :

0 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.97265625

best_result = 313951

best_t = 0.00

Best Solution is :

0 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.974609375

best_result = 316654

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

Page 124: Feature selection in credit scoring- a quadratic

111

alpha:0.9765625

best_result = 319445

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.978515625

best_result = 322224

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.98046875

best_result = 324994

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.982421875

best_result = 327788

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.984375

best_result = 330568

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.986328125

best_result = 333364

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.98828125

best_result = 336144

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.990234375

Page 125: Feature selection in credit scoring- a quadratic

112

best_result = 338921

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.9921875

best_result = 341697

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:0.994140625

best_result = 344485

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

***************************************************

alpha:1.0

best_result = 352652

best_t = 0.00

Best Solution is :

1 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 126: Feature selection in credit scoring- a quadratic

113

APPENDIX B

STATISTICAL DESCRIPTION OF THE U.S. DATASET

ST Companies Non-ST Companies

Features Mean Std. Dev. Kurtosis Skewness Mean Std. Dev. Kurtosis Skewness

X1 0.361 0.670 56.355 5.848 -0.021 0.316 8.929 -1.325

X2 0.279 1.749 178.205 13.086 -0.011 0.278 25.587 2.764

X3 480.721 4923.939 194.356 13.884 72.959 200.664 47.371 6.367

X4 -0.207 5.858 31.251 -4.038 0.108 1.257 37.584 4.680

X5 1.167 2.039 10.465 1.791 -0.138 2.108 6.111 -1.123

X6 -0.151 0.325 17.717 -3.554 -0.289 0.454 19.506 -3.952

X7 0.020 0.186 27.220 -4.115 -0.130 0.540 33.472 -5.528

X8 0.024 0.099 7.820 -1.450 -0.043 0.115 12.045 -2.615

X9 -0.100 0.221 33.680 -4.646 -0.222 0.270 10.414 -2.766

X10 -0.433 2.958 66.838 -6.557 1.478 11.604 67.432 7.710

X11 0.027 0.276 33.744 3.622 0.005 1.134 33.821 2.990

X12 0.343 0.236 1.918 0.659 0.228 0.220 1.154 1.002

X13 0.686 0.220 6.159 1.083 0.782 0.256 15.761 2.235

X14 0.168 0.256 25.603 4.166 0.164 0.405 43.853 -4.898

X15 1.167 0.845 21.683 3.650 0.985 0.703 3.135 1.561

X16 1.638 1.012 16.083 3.036 1.352 0.904 1.009 1.093

X17 0.105 0.939 50.233 -6.110 -0.219 0.989 3.277 -1.807

X18 0.089 0.189 8.452 -1.818 -0.009 0.329 8.503 -2.209

X19 0.776 0.273 8.058 1.842 1.026 0.391 8.572 2.226

X20 0.227 5.871 74.786 -6.934 0.070 12.887 81.407 8.492

X21 5.193 31.415 96.966 8.798 -42.205 396.217 98.178 -9.889

X22 0.655 0.815 10.021 2.683 0.297 0.670 33.849 5.298

X23 1.092 3.698 126.040 10.429 0.022 2.635 41.884 -2.696

X24 0.207 0.274 7.607 -1.815 -0.056 0.402 11.013 -2.490

X25 0.319 0.195 1.601 1.294 0.366 0.246 0.265 1.041

X26 0.535 0.214 -0.011 -0.652 0.454 0.289 -1.308 -0.263

X27 0.763 35.510 11.953 0.034 0.661 14.155 39.076 5.599

X28 0.046 0.225 88.334 8.302 0.044 0.229 61.550 7.338

X29 27.833 42.975 43.738 5.721 40.190 77.122 59.210 6.947

X30 0.402 0.420 6.127 2.136 0.265 0.624 23.381 4.633

X31 0.008 0.008 1.938 1.497 0.004 0.007 7.175 2.617

X32 15.091 32.471 108.150 9.529 20.639 41.338 24.463 4.787

X33 0.103 0.103 6.600 2.090 0.091 0.124 35.439 4.905

X34 3.390 2.367 4.917 1.629 3.320 2.761 9.843 2.362

X35 0.315 0.303 15.418 3.432 0.342 0.491 26.724 4.950

X36 2.169 3.748 38.466 5.606 2.463 2.576 12.193 2.903

X37 1.034 0.912 23.926 3.837 1.179 0.749 0.249 0.785

X38 5.050 15.581 72.063 7.069 -37.448 361.537 97.698 -9.855

X39 0.135 5.629 49.157 -4.792 0.638 4.497 91.389 9.388

X40 -11.495 282.697 183.185 -13.191 -49.935 462.083 96.414 -9.768

Page 127: Feature selection in credit scoring- a quadratic

114

APPENDIX C

STATISTICAL DESCRIPTION OF CHINESE DATASET

ST Companies Non-ST Companies

Features Mean Std. Dev. Kurtosis Skewness Mean Std. Dev. Kurtosis Skewness

X1 0.185 0.295 7.326 1.849 0.060 0.223 23.557 3.084

X2 0.138 0.213 8.916 1.372 0.051 0.198 23.108 2.878

X3 1.072 0.393 333.235 15.610 1.054 0.315 80.835 6.414

X4 4.900 56.749 556.351 23.208 3.772 29.739 77.606 4.020

X5 0.332 0.728 41.215 -3.166 0.149 0.475 3.036 0.359

X6 0.102 0.144 80.319 7.589 0.076 0.123 98.638 8.088

X7 0.151 0.172 71.137 6.906 0.138 0.157 66.033 6.437

X8 0.065 0.035 1.502 1.021 0.046 0.034 13.625 2.830

X9 0.044 0.030 0.810 0.881 0.026 0.029 15.175 3.078

X10 0.085 0.112 374.025 17.279 0.053 0.055 4.797 1.946

X11 0.205 0.177 25.248 -1.349 0.235 0.203 2.007 1.114

X12 0.153 0.221 34.983 5.106 0.097 0.114 8.461 2.510

X13 0.054 0.059 10.346 2.708 0.052 0.051 4.309 1.932

X14 0.026 0.030 33.148 4.326 0.047 0.051 22.747 3.778

X15 1.125 0.817 15.380 3.052 1.029 0.677 17.960 3.180

X16 0.509 0.618 46.304 5.242 0.983 10.826 294.504 17.095

X17 0.148 0.512 15.698 -2.949 0.090 0.486 14.763 -2.705

X18 0.137 0.199 0.116 0.013 0.105 0.205 -0.168 -0.101

X19 0.465 0.152 -0.402 0.041 0.520 0.154 -0.245 -0.363

X20 25.449 117.424 155.687 11.367 10.834 49.928 125.751 10.618

X21 1.086 0.981 76.017 6.453 1.352 1.017 26.121 3.664

X22 0.711 0.619 12.002 2.660 0.828 0.652 4.395 1.803

X23 1.471 1.135 8.308 2.388 1.203 1.131 16.913 3.525

X24 0.535 0.152 -0.402 -0.041 0.480 0.154 -0.245 0.363

X25 0.846 0.171 1.186 -1.324 0.886 0.134 3.241 -1.675

X26 0.154 0.171 1.186 1.324 0.114 0.134 3.241 1.675

X27 85.237 219.937 236.343 13.392 248.966 885.226 236.786 14.621

X28 3.686 8.706 508.459 21.686 4.220 3.731 21.013 3.580

X29 4.820 6.794 74.136 6.899 3.215 4.390 57.606 6.417

X30 2.160 1.289 6.305 2.123 2.251 1.659 21.457 3.751

X31 0.805 0.213 -0.071 -0.298 0.797 0.222 0.808 -0.136

X32 16.668 44.184 43.533 6.006 5.144 9.383 30.065 4.830

X33 0.346 0.583 146.451 9.750 0.487 0.807 38.349 5.528

X34 12.585 34.457 165.356 11.898 15.367 69.005 182.688 12.653

X35 1.368 1.063 9.431 2.322 0.948 0.754 4.011 1.841

X36 4.359 13.160 139.500 10.788 2.932 4.492 24.549 4.521

X37 0.642 0.478 9.958 2.501 0.481 0.365 10.832 2.678

X38 1.460 1.895 64.123 6.655 1.212 1.472 33.752 5.090

X39 1.568 1.382 64.191 7.129 2.506 2.783 24.409 4.459

X40 2.541 2.723 83.724 7.469 2.600 2.051 48.427 5.173

Page 128: Feature selection in credit scoring- a quadratic

115

APPENDIX D

DEFINITIONS OF LONG TERM CREDIT RATINGS FROM S&P

AAA An obligation rated ‘AAA’ has the highest rating assigned by Standard & Poor’s. The

obligor’s capacity to meet its financial commitment on the obligation is extremely

strong.

AA An obligation rated ‘AA’ differs from the highest-rated obligations only to a small

degree. The obligor’s capacity to meet its financial commitment on the obligation is

very strong.

A An obligation rated ‘A’ is somewhat more susceptible to the adverse effects of

changes in circumstances and economic conditions than obligations in higher rated

categories. However, the obligor’s capacity to meet its financial commitment on the

obligation is still strong.

BBB An obligation rated ‘BBB’ exhibits adequate protection parameters. However, adverse

economic conditions or changing circumstances are more likely to lead to a weakened

capacity of the obligor to meet its financial commitment on the obligation.

BB An obligation rated ‘BB’ is less vulnerable to nonpayment than other speculative

issues. However, it faces major ongoing uncertainties or exposure to adverse business,

financial, or economic conditions that could lead to the obligor’s inadequate capacity

to meet its financial commitment on the obligation.

B An obligation rated ‘B’ is more vulnerable to nonpayment than obligations rated ‘BB’,

but the obligor currently has the capacity to meet its financial commitment on the

obligation. Adverse business, financial, or economic conditions will likely impair the

obligor’s capacity or willingness to meet its financial commitment on the obligation.

CCC An obligation rated ‘CCC’ is currently vulnerable to nonpayment and is dependent on

favorable business, financial, and economic conditions for the obligor to meet its

financial commitment on the obligation. In the event of adverse business, financial, or

economic conditions, the obligor is not likely to have the capacity to meet its financial

commitment on the obligation.

CC An obligation rated ‘CC’ is currently highly vulnerable to nonpayment.

C The ‘C’ rating may be used when a bankruptcy petition has been filed or similar

action has been taken but payments on this obligation are being continued. ‘C’ is also

used for a preferred stock that is in arrears (as well as for junior debt of issuers rated

‘CCC-’ and ‘CC’).

D/ SD The ‘D’ rating, unlike other ratings, is not prospective; rather, it is used only when a

default has actually occurred—and not when a default is only expected.

The SD’ (selective default) is assigned when an issuer can be expected to default

selectively, that is, continue to pay certain issues or classes of obligations while not

paying others.

Note: The ratings from ‘AA’ to ‘CCC’ may be modified by the addition of a Plus (+) or minus (-) sign

Page 129: Feature selection in credit scoring- a quadratic

116

APPENDIX E

COMPLETE SELECTED SUBSETS AND OCAR FOR THE U.S. DATASET

Features # of

Features

OCAR

24 1 72.3906%

12, 18 2 75.0842%

13, 19 2 74.7475%

13, 19,39 3 77.1044%

7, 21, 24 3 74.7475%

13, 19,21,39 4 75.7576%

4, 7, 21, 24 4 76.7677%

4, 7, 10, 24 4 74.4108%

7, 24, 26, 38 4 74.0741%

7, 24, 26, 32, 38 5 70.3704%

5, 7, 10, 21, 24 5 76.0943%

5, 7, 10, 21, 24, 32 6 72.0539%

5, 8, 10, 21, 24, 29 6 75.7576%

5, 8, 10, 21, 22, 24 6 74.7475%

5, 8, 10, 21, 22, 24, 32 7 72.0539%

5, 8, 10, 21, 22, 24, 32, 40 8 69.0236%

5, 8, 10, 13, 21, 22, 24, 32 8 72.0539%

5, 8, 10, 13, 21, 22, 24, 32, 40 9 69.0236%

5, 8, 10, 13, 19, 21, 24, 30, 32 9 72.0539%

5, 8, 10, 13, 19, 21, 22, 24, 32 9 72.0539%

5, 8, 10, 13, 19, 21, 22, 24, 32, 40 10 69.0236%

1, 5, 8, 10, 13, 19, 21, 22, 24, 32, 40 11 69.0236%

1, 5, 8, 10, 19, 21, 22, 24, 26, 32, 40 11 69.0236%

1, 5, 7, 8, 10, 19, 21, 22, 24, 26, 32, 40 12 69.0236%

1, 5, 7, 8, 10, 19, 21, 22, 24, 26, 32, 39, 40 13 69.6970%

1, 5, 7, 8, 10, 19, 21, 22, 24, 26, 31, 32, 39, 40 14 69.6970%

1, 5, 7, 8, 10, 13, 19, 21, 22, 24, 26, 31, 32, 39, 40 15 69.6970%

1, 5, 7, 8, 10, 12, 19, 21, 22, 24, 26, 31, 32, 39, 40 15 69.6970%

1, 5, 7, 8, 9, 10, 13, 19, 21, 22, 24, 26, 31, 32, 39, 40 16 69.6970%

1, 5, 7, 8, 9, 10, 12, 19, 21, 22, 24, 26, 31, 32, 39, 40 16 69.6970%

1, 5, 7, 8, 9, 10, 12, 17, 19, 21, 22, 24, 26, 31, 32, 39, 40 17 70.0337%

1, 5, 7, 8, 9, 10, 12, 18, 19, 21, 22, 24, 26, 29, 31, 32, 39, 40 18 71.3805%

1, 5, 7, 8, 9, 10, 12, 18, 19, 21, 22, 24, 26, 29, 31, 32, 38, 39, 40 19 70.3704%

1, 5, 7, 8, 9, 10, 12, 13, 18, 19, 21, 22, 24, 26, 29, 31, 32, 38, 39, 40 20 70.3704%

Page 130: Feature selection in credit scoring- a quadratic

APPENDIX E (continued)

Features # of

Features

OCAR

1, 5, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 24, 26, 29, 31, 32, 38, 39, 40 21 70.7071%

1, 5, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 31, 32, 38, 39, 40 22 70.3704%

1, 5, 6, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 31, 32, 38, 39, 40 23 70.3704%

1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 31, 32, 38, 39, 40 24 70.0337%

1, 5, 6, 7, 8, 9, 10, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 24 70.7071%

1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 25 70.0337%

1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 26 70.0337%

1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 26, 29, 30, 31, 32, 38, 39, 40 27 70.7071%

1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 38, 39, 40 28 70.7071%

1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 38, 39, 40 29 70.3704%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 38, 39, 40 30 70.3704%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 37, 38, 39, 40 31 70.3704%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 37, 38, 39, 40 32 70.3704%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 37, 38, 39, 40 33 70.3704%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 36, 37, 38, 39, 40 34 70.7071%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40 35 70.7071%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 35, 36, 37, 38 , 39, 40 36 70.0337%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 40 37 70.0337%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 36 , 37, 38, 39, 40 38 70.7071%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 39 70.3704%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40 40 69.3603%

The numbers in this table represent the subscript of 40 features from X1 to X40

117

Page 131: Feature selection in credit scoring- a quadratic

118

APPENDIX F

COMPLETE SELECTED SUBSETS AND OCAR FOR CHINESE DATASET

Features # of

Features

OCAR

9 1 71.5556%

9, 36 2 66.7778%

1, 17, 37 3 68.6667%

24, 27, 32 3 69.2222%

19, 27, 32 3 69.2222%

1, 37, 39 3 73.1111%

8, 14, 26 3 72.6667%

8, 14, 25 3 72.6667%

1, 17, 27, 37 4 71.4444%

10, 14, 26, 27 4 71.1111%

10, 14, 25, 27 4 71.1111%

8, 14, 25, 27 4 71.1111%

8,14,26,27 4 71.1111%

10, 14, 16, 26,27 5 71.2222%

10, 14, 16, 25,27 5 71.2222%

8, 14, 16, 25, 27 5 71.2222%

8, 14, 16, 26, 27 5 71.2222%

2, 10, 14, 25, 27 5 71.1111%

2, 10, 14, 26, 27 5 71.1111%

8, 14, 21, 25, 27 5 71.1111%

8, 14, 21, 26, 27 5 71.1111%

2, 10, 14, 16, 25, 27 6 71.5556%

2, 10, 14, 16, 26, 27 6 71.5556%

5, 10, 14, 24, 25, 27 6 71.6667%

5, 10, 14, 19, 26, 27 6 71.6667%

5, 10, 14, 24, 26, 27 6 71.6667%

5, 10, 14, 19, 25, 27 6 71.6667%

8, 14, 21, 25, 27, 32 6 72.0000%

8, 14, 21, 26, 27, 32 6 72.0000%

8, 14, 16, 21, 26, 27, 32 7 72.0000%

8, 14, 16, 21, 25, 27, 32 7 72.0000%

5, 10, 14, 16, 24, 25, 27, 32 8 70.4444%

5, 10, 14, 16, 24, 26, 27, 32 8 70.4444%

5, 10, 14, 16, 19, 26, 27, 32 8 70.4444%

Page 132: Feature selection in credit scoring- a quadratic

119

APPENDIX F (continued)

Features # of

Features

OCAR

5, 10, 14, 16, 19, 25, 27, 32 8 70.4444%

5, 8, 14, 16, 19, 26, 27, 32 8 70.4444%

5, 8, 14, 16, 19, 25, 27, 32 8 70.4444%

5, 8, 14, 16, 24, 25, 27, 32 8 70.4444%

5, 8, 14, 16, 24, 26, 27, 32 8 70.4444%

5, 10, 14, 16, 19, 26, 27, 32, 39 9 69.3333%

5, 10, 14, 16, 24, 25, 27, 32, 39 9 69.3333%

5, 10, 14, 16, 24, 26, 27, 32, 39 9 69.3333%

5, 10, 14, 16, 19, 25, 27, 32, 39 9 69.3333%

5, 9, 10, 14, 16, 24, 26, 27, 32 9 70.5556%

5, 9, 10, 14, 16, 24, 25, 27, 32 9 70.5556%

5, 9, 10, 14, 16, 19, 25, 27, 32 9 70.5556%

5, 9, 10, 14, 16, 19, 26, 27, 32 9 70.5556%

5, 8, 14, 16, 19, 26, 27, 32, 39 9 69.3333%

5, 8, 14, 16, 19, 25, 27, 32, 39 9 69.3333%

5, 8, 14, 16, 24, 26, 27, 32, 39 9 69.3333%

5, 8, 14, 16, 24, 25, 27, 32, 39 9 69.3333%

5, 9, 10, 14, 16, 19, 25, 27, 32, 39 10 69.3333%

5, 9, 10, 14, 16, 24, 25, 27, 32, 39 10 69.3333%

5, 9, 10, 14, 16, 24, 26, 27, 32, 39 10 69.3333%

5, 9, 10, 14, 16, 19, 26, 27, 32, 39 10 69.3333%

5, 9, 10, 14, 16, 24, 25, 27, 32, 35, 39 11 69.1111%

5, 9, 10, 14, 16, 24, 26, 27, 32, 35, 39 11 69.1111%

5, 9, 10, 14, 16, 19, 25, 27, 32, 35, 39 11 69.1111%

5, 9, 10, 14, 16, 19, 26, 27, 32, 35, 39 11 69.1111%

2, 5, 9, 10, 14, 16, 24, 26, 27, 32, 37, 39 12 69.5556%

2, 5, 9, 10, 14, 16, 24, 25, 27, 32, 37, 39 12 69.5556%

2, 5, 9, 10, 14, 16, 19, 25, 27, 32, 37, 39 12 69.5556%

2, 5, 9, 10, 14, 16, 19, 26, 27, 32, 37, 39 12 69.5556%

1, 5, 9, 10, 14, 16, 24, 25, 27, 32, 37, 39 12 70.1111%

1, 5, 9, 10, 14, 16, 24, 26, 27, 32, 37, 39 12 70.1111%

1, 5, 9, 10, 14, 16, 19, 25, 27, 32, 37, 39 12 70.1111%

1, 5, 9, 10, 14, 16, 19, 26, 27, 32, 37, 39 12 70.1111%

1, 5, 9, 10, 11, 14, 16, 24, 25, 27, 32, 37, 39 13 70.3333%

1, 5, 9, 10, 11, 14, 16, 24, 26, 27, 32, 37, 39 13 70.3333%

1, 5, 9, 10, 11, 14, 16, 19, 26, 27, 32, 37, 39 13 70.3333%

1, 5, 9, 10, 11, 14, 16, 19, 25, 27, 32, 37, 39 13 70.3333%

Page 133: Feature selection in credit scoring- a quadratic

120

APPENDIX F (continued)

Features # of

Features

OCAR

1, 5, 8, 9, 10, 14, 16, 19, 26, 27, 32, 37, 39 13 70.2222%

1, 5, 8, 9, 10, 14, 16, 24, 26, 27, 32, 37, 39 13 70.2222%

1, 5, 8, 9, 10, 14, 16, 24, 25, 27, 32, 37, 39 13 70.2222%

1, 5, 8, 9, 10, 14, 16, 19, 25, 27, 32, 37, 39 13 70.2222%

1, 5, 8, 9, 10, 11, 14, 16, 19, 25, 27, 32, 37, 39 14 70.4444%

1, 5, 8, 9, 10, 11, 14, 16, 19, 26, 27, 32, 37, 39 14 70.4444%

1, 5, 8, 9, 10, 11, 14, 16, 24, 25, 27, 32, 37, 39 14 70.4444%

1, 5, 8, 9, 10, 11, 14, 16, 24, 26, 27, 32, 37, 39 14 70.4444%

2, 5, 8, 9, 10, 11, 14, 16, 24, 26, 27, 29, 32, 35, 39 15 69.1111%

2, 5, 8, 9, 10, 11, 14, 16, 19, 26, 27, 29, 32, 35, 39 15 69.1111%

2, 5, 8, 9, 10, 11, 14, 16, 24, 25, 27, 29, 32, 35, 39 15 69.1111%

2, 5, 8, 9, 10, 11, 14, 16, 19, 25, 27, 29, 32, 35, 39 15 69.1111%

2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 26, 27, 29, 32, 35, 39 16 70.2222%

2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 27, 29, 32, 35, 39 16 70.2222%

1, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 27, 29, 32, 35, 39 16 70.2222%

2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 26, 27, 29, 32, 35, 39 16 70.2222%

1, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 27, 29, 32, 35, 39 16 70.2222%

1, 5, 8, 9, 10, 11, 14, 16, 20, 24, 26, 27, 29, 32, 35, 39 16 70.2222%

2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 27, 29, 32, 35, 39 16 70.2222%

1, 5, 8, 9, 10, 11, 14, 16, 19, 20, 26, 27, 29, 32, 35, 39 16 70.2222%

2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 26, 27, 29, 32, 35, 39 17 70.3333%

2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 26, 27, 29, 32, 35, 39 17 70.3333%

2, 5, 8, 9, 10, 11, 14, 16, 19, 24, 25, 26, 27, 29, 32, 35, 39 17 69.8889%

2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 26, 27, 29, 32, 33, 35, 39 18 70.2222%

2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 26, 27, 29, 32, 33, 35, 39 18 70.2222%

1, 2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 26, 27, 29, 32, 33, 35, 39 18 70.3333%

2, 5, 8, 9, 10, 11, 14, 16, 19, 24, 25, 26, 27, 29, 32, 33, 35, 39 18 70.0000%

1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 27, 29, 32, 33, 35, 39 18 70.3333%

1, 2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 27, 29, 32, 35, 37, 39 18 70.3333%

1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 26, 27, 29, 32, 35, 37, 39 18 70.3333%

2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 24, 25, 26, 27, 32, 33, 35, 37, 39 19 69.7778%

1, 2, 5, 8, 9, 10, 11, 14, 16, 20, 24, 25, 26, 27, 29, 32, 35, 37, 39 19 70.4444%

1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 25, 26, 27, 29, 32, 35, 37, 39 19 70.4444%

1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 24, 25, 26, 27, 32, 33, 35, 37, 39 20 69.7778%

1, 2, 5, 8, 9, 10, 11, 14, 16, 19, 20, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 21 70.3333%

1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 22 70.3333%

1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 23 70.5556%

Page 134: Feature selection in credit scoring- a quadratic

APPENDIX F (continued)

Features # of

Features

OCAR

1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 37, 39 24 70.5556%

1, 2, 5, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 36, 37, 39 25 70.8889%

1, 2, 5, 6, 8, 9, 10, 11, 12, 14, 16, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 36, 37, 39 26 70.8889%

1, 2, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 29, 32, 33, 35, 36, 37, 39 27 71.0000%

1, 2, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 39 28 71.4444%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 39 29 71.4444%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 30 70.1111%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 35, 36, 37, 39 30 71.4444%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 31 70.3333%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 31 70.5556%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 32 70.5556%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 33 70.4444%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 39 34 70.5556%

1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39 35 69.0000%

1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 39 36 69.0000%

1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39 37 68.8889%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35 , 36, 37, 38, 39 38 69.0000%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 39 69.0000%

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 , 35, 36, 37, 38, 39, 40 40 69.7778%

The numbers in this table represent the subscript of 40 features from X1 to X40

121

Page 135: Feature selection in credit scoring- a quadratic

APPENDIX G

SENSITIVITY AND 1-SPECIFICITY FOR THE U.S. DATASET

SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree

Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.

.0000 1.000 1.000 .0000 1.000 1.000 .0000 1.000 1.000 .0000 1.000 1.000 .0000 1.000 1.000

.0787 1.000 .971 .0543 1.000 .971 .0568 1.000 .971 .1598 1.000 .971 .4395 .316 .118

.0853 1.000 .941 .0660 1.000 .941 .0699 1.000 .941 .1646 1.000 .941 1.0000 .000 .000

.0918 1.000 .912 .0781 1.000 .912 .0840 1.000 .912 .1691 1.000 .912

.0994 1.000 .882 .0908 1.000 .882 .0969 1.000 .882 .1739 1.000 .882

.1126 1.000 .853 .0971 1.000 .853 .1042 1.000 .853 .1758 1.000 .853

.1231 1.000 .824 .1094 1.000 .824 .1136 1.000 .824 .1851 1.000 .824

.1266 1.000 .794 .1224 1.000 .794 .1236 1.000 .794 .1934 1.000 .794

.1305 1.000 .765 .1260 1.000 .765 .1281 1.000 .765 .1950 1.000 .765

.1359 1.000 .735 .1280 1.000 .735 .1294 1.000 .735 .1978 1.000 .735

.1437 1.000 .706 .1298 1.000 .706 .1309 1.000 .706 .1995 1.000 .706

.1488 1.000 .676 .1306 1.000 .676 .1319 1.000 .676 .1998 .947 .706

.1506 1.000 .647 .1330 1.000 .647 .1350 1.000 .647 .2002 .947 .676

.1530 1.000 .618 .1428 .947 .647 .1447 .947 .647 .2038 .947 .647

.1583 .947 .618 .1540 .947 .618 .1580 .947 .618 .2092 .895 .647

.1624 .895 .618 .1691 .895 .618 .1699 .895 .618 .2186 .895 .618

.1645 .895 .588 .1817 .895 .588 .1781 .842 .618 .2262 .895 .588

.1696 .842 .588 .1861 .895 .559 .1824 .842 .588 .2279 .895 .559

.1748 .842 .559 .1923 .842 .559 .1885 .842 .559 .2297 .842 .559

.1834 .842 .529 .1953 .789 .559 .1963 .842 .529 .2381 .842 .529

.1928 .842 .500 .1988 .789 .529 .1998 .789 .529 .2468 .789 .529

.1970 .842 .471 .2057 .789 .500 .2010 .789 .500 .2480 .789 .500

.1998 .842 .441 .2110 .789 .471 .2023 .789 .471 .2492 .737 .500

.2010 .842 .412 .2156 .737 .471 .2028 .789 .441 .2517 .737 .471

.2013 .789 .412 .2197 .737 .441 .2134 .737 .441 .2566 .684 .471

.2062 .789 .382 .2282 .684 .441 .2289 .684 .441 .2651 .684 .441

.2143 .789 .353 .2436 .684 .412 .2402 .684 .412 .2712 .684 .412

.2202 .789 .324 .2533 .684 .382 .2465 .684 .382 .2752 .684 .382

.2229 .789 .294 .2559 .684 .353 .2472 .684 .353 .2813 .684 .353

.2302 .789 .265 .2644 .684 .324 .2555 .684 .324 .2881 .684 .324

.2454 .737 .265 .2748 .684 .294 .2649 .684 .294 .2923 .684 .294

.2547 .737 .235 .2836 .684 .265 .2738 .684 .265 .2954 .684 .265

.2564 .684 .235 .2925 .684 .235 .2819 .684 .235 .2992 .684 .235

.2609 .684 .206 .2963 .632 .235 .2856 .632 .235 .3024 .632 .235

122

Page 136: Feature selection in credit scoring- a quadratic

APPENDIX G (continued)

SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree

Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.

.2676 .684 .176 .3015 .579 .235 .2893 .632 .206 .3166 .579 .235

.2841 .684 .147 .3072 .579 .206 .2907 .579 .206 .3291 .526 .235

.3062 .684 .118 .3149 .526 .206 .3005 .526 .206 .3345 .526 .206

.3187 .632 .118 .3290 .474 .206 .3150 .474 .206 .3397 .474 .206

.3299 .579 .118 .3378 .474 .176 .3228 .421 .206 .3434 .474 .176

.3446 .579 .088 .3407 .474 .147 .3256 .368 .206 .3487 .474 .147

.3571 .579 .059 .3443 .421 .147 .3287 .368 .176 .3590 .421 .147

.3765 .526 .059 .3571 .368 .147 .3315 .368 .147 .3763 .368 .147

.4270 .474 .059 .3714 .368 .118 .3416 .368 .118 .3922 .316 .147

.4815 .421 .059 .3793 .316 .118 .3531 .316 .118 .4011 .316 .118

.5037 .368 .059 .3957 .316 .088 .3721 .316 .088 .4041 .316 .088

.5081 .316 .059 .4156 .263 .088 .3973 .263 .088 .4107 .263 .088

.5117 .263 .059 .4332 .263 .059 .4146 .263 .059 .4265 .263 .059

.5229 .211 .059 .4861 .211 .059 .4481 .211 .059 .5103 .211 .059

.6365 .158 .059 .5768 .158 .059 .5271 .158 .059 .6279 .158 .059

.7626 .105 .059 .6329 .105 .059 .5930 .105 .059 .6783 .105 .059

.7941 .053 .059 .7063 .053 .059 .6713 .053 .059 .7575 .053 .059

.8269 .053 .029 .7967 .053 .029 .7565 .053 .029 .8491 .053 .029

.8815 .000 .029 .8550 .053 .000 .8120 .053 .000 .8744 .053 .000

1.0000 .000 .000 1.0000 .000 .000 1.0000 .000 .000 1.0000 .000 .000

Note: Sen. refers to Sensitivity; Spe. refers to Specificity

123

Page 137: Feature selection in credit scoring- a quadratic

APPENDIX H

SENSITIVITY AND 1-SPECIFICITY FOR CHINESE DATASET

SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree

Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.

0.0000 1.000 1.000 0.0000 1.000 1.000 0.0000 1.000 1.000 0.0000 1.000 1.000 0.0000 1.000 1.000

0.1885 .975 1.000 0.0605 1.000 .986 0.0855 1.000 .986 0.0670 1.000 .986 0.1748 .900 .521

0.1950 .975 .986 0.0647 1.000 .973 0.0901 1.000 .973 0.0678 1.000 .973 0.2598 .825 .438

0.1967 .975 .973 0.0729 .975 .973 0.0990 .975 .973 0.0699 .975 .973 0.3761 .775 .329

0.1993 .950 .973 0.0794 .975 .959 0.1058 .975 .959 0.0719 .975 .959 0.5039 .675 .288

0.2010 .950 .959 0.0988 .975 .945 0.1248 .975 .945 0.0778 .975 .945 1.0000 0.000 0.000

0.2032 .950 .945 0.1277 .975 .932 0.1549 .975 .932 0.0911 .975 .932

0.2081 .950 .932 0.1449 .950 .932 0.1709 .950 .932 0.0999 .975 .918

0.2136 .950 .918 0.1516 .950 .918 0.1761 .950 .918 0.1010 .975 .904

0.2161 .950 .904 0.1528 .950 .904 0.1774 .950 .904 0.1039 .975 .890

0.2166 .950 .890 0.1540 .950 .890 0.1783 .950 .890 0.1068 .975 .877

0.2184 .950 .877 0.1570 .950 .877 0.1809 .950 .877 0.1086 .975 .863

0.2208 .950 .863 0.1609 .950 .863 0.1832 .950 .863 0.1106 .975 .849

0.2219 .950 .849 0.1620 .950 .849 0.1846 .950 .849 0.1204 .975 .836

0.2226 .950 .836 0.1677 .950 .836 0.1896 .950 .836 0.1305 .975 .822

0.2241 .950 .822 0.1741 .950 .822 0.1951 .950 .822 0.1344 .975 .808

0.2252 .950 .808 0.1815 .950 .808 0.2032 .950 .808 0.1429 .975 .795

0.2255 .950 .795 0.1893 .925 .808 0.2124 .950 .795 0.1501 .975 .781

0.2257 .950 .781 0.1993 .925 .795 0.2187 .925 .795 0.1522 .975 .767

0.2261 .950 .767 0.2099 .925 .781 0.2249 .925 .781 0.1530 .975 .753

0.2265 .950 .753 0.2118 .925 .767 0.2274 .925 .767 0.1540 .975 .740

0.2269 .950 .740 0.2132 .925 .753 0.2280 .925 .753 0.1566 .975 .726

0.2275 .950 .726 0.2160 .925 .740 0.2298 .925 .740 0.1592 .975 .712

0.2280 .950 .712 0.2194 .925 .726 0.2329 .925 .726 0.1604 .975 .699

0.2284 .925 .712 0.2216 .925 .712 0.2354 .925 .712 0.1655 .950 .699

0.2294 .925 .699 0.2249 .925 .699 0.2382 .925 .699 0.1714 .950 .685

0.2301 .925 .685 0.2346 .925 .685 0.2454 .925 .685 0.1754 .950 .671

0.2303 .925 .671 0.2422 .925 .671 0.2521 .925 .671 0.1795 .950 .658

0.2305 .925 .658 0.2441 .925 .658 0.2536 .925 .658 0.1815 .950 .644

0.2309 .925 .644 0.2453 .925 .644 0.2561 .925 .644 0.1856 .950 .630

0.2315 .925 .630 0.2471 .925 .630 0.2598 .925 .630 0.2000 .950 .616

0.2320 .900 .630 0.2498 .925 .616 0.2623 .925 .616 0.2128 .950 .603

0.2326 .900 .616 0.2532 .925 .603 0.2639 .925 .603 0.2154 .950 .589

0.2331 .900 .603 0.2562 .900 .603 0.2654 .925 .589 0.2219 .950 .575

124

Page 138: Feature selection in credit scoring- a quadratic

APPENDIX H (continued)

SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree

Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.

0.2335 .900 .589 0.2596 .900 .589 0.2680 .900 .589 0.2293 .950 .562

0.2337 .900 .575 0.2631 .875 .589 0.2717 .875 .589 0.2319 .950 .548

0.2340 .900 .562 0.2662 .875 .575 0.2749 .875 .575 0.2335 .950 .534

0.2347 .875 .562 0.2701 .875 .562 0.2782 .875 .562 0.2410 .925 .534

0.2356 .875 .548 0.2751 .875 .548 0.2812 .875 .548 0.2488 .925 .521

0.2363 .875 .534 0.2804 .875 .534 0.2837 .875 .534 0.2516 .900 .521

0.2367 .875 .521 0.2843 .875 .521 0.2859 .875 .521 0.2573 .900 .507

0.2367 .875 .507 0.2865 .875 .507 0.2866 .875 .507 0.2642 .900 .493

0.2368 .850 .507 0.2874 .875 .493 0.2899 .875 .493 0.2699 .875 .493

0.2369 .850 .493 0.2884 .875 .479 0.2943 .875 .479 0.2797 .875 .479

0.2382 .850 .479 0.2916 .850 .479 0.2971 .875 .466 0.2909 .850 .479

0.2396 .850 .466 0.2954 .850 .466 0.2996 .875 .452 0.3044 .850 .466

0.2397 .850 .452 0.3040 .850 .452 0.3040 .850 .452 0.3155 .850 .452

0.2399 .850 .438 0.3133 .825 .452 0.3083 .825 .452 0.3187 .850 .438

0.2401 .825 .438 0.3181 .825 .438 0.3121 .825 .438 0.3231 .850 .425

0.2410 .825 .425 0.3216 .825 .425 0.3157 .825 .425 0.3332 .850 .411

0.2421 .825 .411 0.3222 .825 .411 0.3167 .825 .411 0.3450 .850 .397

0.2429 .825 .397 0.3234 .800 .411 0.3189 .825 .397 0.3498 .850 .384

0.2435 .825 .384 0.3263 .800 .397 0.3221 .825 .384 0.3513 .850 .370

0.2443 .825 .370 0.3285 .800 .384 0.3236 .800 .384 0.3551 .825 .370

0.2480 .825 .356 0.3296 .800 .370 0.3242 .800 .370 0.3599 .800 .370

0.2514 .825 .342 0.3326 .800 .356 0.3260 .800 .356 0.3658 .800 .356

0.2518 .825 .329 0.3371 .800 .342 0.3278 .775 .356 0.3706 .800 .342

0.2520 .825 .315 0.3430 .775 .342 0.3340 .775 .342 0.3725 .800 .329

0.2536 .800 .315 0.3489 .775 .329 0.3402 .750 .342 0.3786 .800 .315

0.2577 .775 .315 0.3525 .750 .329 0.3409 .750 .329 0.3835 .800 .301

0.2606 .775 .301 0.3543 .750 .315 0.3422 .750 .315 0.3899 .800 .288

0.2622 .750 .301 0.3571 .750 .301 0.3451 .750 .301 0.3963 .800 .274

0.2652 .750 .288 0.3594 .750 .288 0.3474 .750 .288 0.3980 .775 .274

0.2678 .750 .274 0.3602 .750 .274 0.3490 .750 .274 0.4018 .750 .274

0.2694 .725 .274 0.3611 .750 .260 0.3517 .725 .274 0.4045 .750 .260

0.2707 .700 .274 0.3646 .725 .260 0.3545 .725 .260 0.4061 .750 .247

0.2725 .675 .274 0.3683 .725 .247 0.3561 .725 .247 0.4084 .725 .247

0.2748 .675 .260 0.3700 .700 .247 0.3574 .700 .247 0.4128 .700 .247

0.2767 .675 .247 0.3721 .700 .233 0.3585 .675 .247 0.4207 .675 .247

0.2788 .650 .247 0.3732 .675 .233 0.3589 .675 .233 0.4256 .650 .247

0.2814 .625 .247 0.3738 .650 .233 0.3602 .650 .233 0.4258 .650 .233

125

Page 139: Feature selection in credit scoring- a quadratic

APPENDIX H (continued)

SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree

Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.

0.2830 .625 .233 0.3757 .625 .233 0.3625 .625 .233 0.4258 .625 .233

0.2905 .625 .219 0.3779 .625 .219 0.3646 .625 .219 0.4258 .625 .219

0.3033 .625 .205 0.3783 .600 .219 0.3664 .625 .205 0.4259 .525 .205

0.3140 .600 .205 0.3803 .600 .205 0.3677 .625 .192 0.4261 .525 .192

0.3196 .600 .192 0.3833 .600 .192 0.3678 .600 .192 0.4278 .500 .192

0.3396 .575 .192 0.3856 .600 .178 0.3681 .575 .192 0.4295 .475 .192

0.3601 .575 .178 0.3872 .575 .178 0.3732 .550 .192 0.4408 .450 .192

0.3613 .550 .178 0.3920 .550 .178 0.3804 .525 .192 0.4525 .425 .192

0.3637 .550 .164 0.3968 .525 .178 0.3835 .525 .178 0.4579 .425 .178

0.3677 .550 .151 0.3991 .500 .178 0.3843 .500 .178 0.4637 .400 .178

0.3704 .525 .151 0.4031 .475 .178 0.3845 .475 .178 0.4682 .375 .178

0.3722 .500 .151 0.4061 .475 .164 0.3849 .475 .164 0.4744 .350 .178

0.3802 .500 .137 0.4190 .475 .151 0.3988 .475 .151 0.4821 .350 .164

0.3873 .475 .137 0.4322 .450 .151 0.4165 .450 .151 0.4905 .350 .151

0.3918 .450 .137 0.4375 .425 .151 0.4214 .425 .151 0.4949 .325 .151

0.3998 .425 .137 0.4419 .400 .151 0.4220 .425 .137 0.4997 .325 .137

0.4077 .400 .137 0.4448 .400 .137 0.4220 .425 .123 0.5074 .325 .123

0.4277 .400 .123 0.4485 .400 .123 0.4269 .400 .123 0.5127 .325 .110

0.4550 .375 .123 0.4582 .400 .110 0.4349 .375 .123 0.5159 .325 .096

0.4701 .375 .110 0.4715 .350 .110 0.4422 .350 .123 0.5179 .300 .096

0.4993 .375 .096 0.4791 .325 .110 0.4465 .350 .110 0.5185 .300 .082

0.5317 .375 .082 0.4836 .300 .110 0.4478 .325 .110 0.5220 .275 .082

0.5394 .350 .082 0.4936 .275 .110 0.4503 .300 .110 0.5275 .250 .082

0.5467 .325 .082 0.5048 .275 .096 0.4671 .275 .110 0.5389 .225 .082

0.5721 .300 .082 0.5083 .250 .096 0.4845 .275 .096 0.5558 .225 .068

0.5925 .275 .082 0.5165 .225 .096 0.4875 .275 .082 0.5687 .200 .068

0.5948 .275 .068 0.5306 .225 .082 0.4893 .250 .082 0.5743 .175 .068

0.5959 .250 .068 0.5432 .225 .068 0.4949 .250 .068 0.5769 .150 .068

0.6020 .250 .055 0.5618 .200 .068 0.5071 .225 .068 0.5805 .150 .055

0.6138 .225 .055 0.5760 .175 .068 0.5160 .225 .055 0.5828 .150 .041

0.6215 .200 .055 0.5853 .175 .055 0.5329 .200 .055 0.5866 .125 .041

0.6246 .175 .055 0.6314 .150 .055 0.5571 .175 .055 0.5989 .100 .041

0.6381 .150 .055 0.6762 .150 .041 0.6007 .150 .055 0.6130 .075 .041

0.6559 .125 .055 0.7051 .150 .027 0.6530 .150 .041 0.6198 .075 .027

0.6618 .100 .055 0.8006 .125 .027 0.6806 .150 .027 0.6225 .050 .027

0.6730 .100 .041 0.8985 .100 .027 0.7677 .125 .027 0.6277 .025 .027

0.6856 .100 .027 0.9441 .100 .014 0.8741 .100 .027 0.6390 0.000 .027

126

Page 140: Feature selection in credit scoring- a quadratic

APPENDIX H (continued)

SVM Logistic Regression Discriminant Analysis Neural Networks Decision Tree

Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe. Cutoff Sen. 1-Spe.

0.6919 .100 .014 0.9745 .075 .014 0.9277 .100 .014 0.6502 0.000 .014

0.7094 .075 .014 0.9892 .075 0.000 0.9645 .075 .014 1.0000 0.000 0.000

0.7234 .075 0.000 0.9958 .050 0.000 0.9838 .075 0.000

0.7378 .050 0.000 0.9984 .025 0.000 0.9928 .050 0.000

0.7712 .025 0.000 1.0000 0.000 0.000 0.9970 .025 0.000

1.0000 0.000 0.000 1.0000 0.000 0.000

Note: Sen. refers to Sensitivity; Spe. refers to Specificity

127

Page 141: Feature selection in credit scoring- a quadratic

128

VITA

Name: Jun Huang

Address: Room 201, No. 46, Guanghanzhi Street, Guangzhou, China, 510224

Education: PhD in International Business, A.R. Sanchez, Jr. School of Business,

Texas A&M International University, May 2014

MSc in International Management, Business School of Oxford Brookes

University, January 2006

Bachelor of Management in Accounting of Foreign Affairs, School of

Economics & Management, Guangdong University of Technology,

July 2004