38
Dynamic Bayesian modeling for risk prediction in credit operations Hanen Borchani 1 , Ana M. Martínez 1 , Andrés R. Masegosa 2 , Helge Langseth 2 , Thomas D. Nielsen 1 , Antonio Salmerón 3 , Antonio Fernández 4 , Anders L. Madsen 1,5 , Ramón Sáez 4 1 Department of Computer Science, Aalborg University, Denmark 2 Department of Computer and Information Science, The Norwegian University of Science and Technology, Norway 3 Department of Mathematics, University of Almería, Spain 4 Banco de Crédito Cooperativo, Spain 5 Hugin Expert A/S, Aalborg, Denmark Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 1

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Embed Size (px)

Citation preview

Page 1: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian modeling for riskprediction in credit operations

Hanen Borchani1, Ana M. Martínez1, Andrés R. Masegosa2,Helge Langseth2, Thomas D. Nielsen1, Antonio Salmerón3,Antonio Fernández4, Anders L. Madsen1,5, Ramón Sáez4

1Department of Computer Science, Aalborg University, Denmark2 Department of Computer and Information Science,

The Norwegian University of Science and Technology, Norway3Department of Mathematics, University of Almería, Spain

4 Banco de Crédito Cooperativo, Spain5 Hugin Expert A/S, Aalborg, Denmark

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 1

Page 2: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Outline

1 Introduction

2 The financial data set

3 Risk prediction using dynamic Bayesian networks

4 Experimental results

5 Conclusion

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 2

Page 3: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Outline

1 Introduction

2 The financial data set

3 Risk prediction using dynamic Bayesian networks

4 Experimental results

5 Conclusion

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 3

Page 4: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Introduction

I Efficient solutions for risk prediction in banks can be crucial for reducinglosses due to inefficient business procedures.

I Such solutions can be used as tools for monitoring the evolution ofcustomers in terms of credit operations risk to increase solvency of thebanking institutions.

I From a machine learning perspective, credit scoring has traditionallybeen approached as a supervised classification problem.

I However, recently, this problem presents additional challengingcharacteristics that separate it from the standard supervisedclassification problems.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 4

Page 5: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Introduction

I Efficient solutions for risk prediction in banks can be crucial for reducinglosses due to inefficient business procedures.

I Such solutions can be used as tools for monitoring the evolution ofcustomers in terms of credit operations risk to increase solvency of thebanking institutions.

I From a machine learning perspective, credit scoring has traditionallybeen approached as a supervised classification problem.

I However, recently, this problem presents additional challengingcharacteristics that separate it from the standard supervisedclassification problems.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 5

Page 6: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Introduction

I Efficient solutions for risk prediction in banks can be crucial for reducinglosses due to inefficient business procedures.

I Such solutions can be used as tools for monitoring the evolution ofcustomers in terms of credit operations risk to increase solvency of thebanking institutions.

I From a machine learning perspective, credit scoring has traditionallybeen approached as a supervised classification problem.

I However, recently, this problem presents additional challengingcharacteristics that separate it from the standard supervisedclassification problems.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 6

Page 7: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Challenges

I Classification in a streaming context: a stream of multiple sequences receivedover time, each sequence representing a particular client. That is, at every timestep t, we receive the data Dt containing information about all the clients.

I A delayed class-feedback: the class label for each sample/client corresponds tothe client’s defaulting behavior in the following twelve months and thisinformation is therefore only available after a twelve month delay. Thus, theavailable data is a mixture of labeled and unlabeled samples.

I Concept drift: the domain exhibits a form of concept drift where the datadistribution as well as the set of feature variables relevant for classificationmay vary over time.

Objective: Explore the credit scoring problem based on a real-world data set.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 7

Page 8: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Challenges

I Classification in a streaming context: a stream of multiple sequences receivedover time, each sequence representing a particular client. That is, at every timestep t, we receive the data Dt containing information about all the clients.

I A delayed class-feedback: the class label for each sample/client corresponds tothe client’s defaulting behavior in the following twelve months and thisinformation is therefore only available after a twelve month delay. Thus, theavailable data is a mixture of labeled and unlabeled samples.

I Concept drift: the domain exhibits a form of concept drift where the datadistribution as well as the set of feature variables relevant for classificationmay vary over time.

Objective: Explore the credit scoring problem based on a real-world data set.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 8

Page 9: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Challenges

I Classification in a streaming context: a stream of multiple sequences receivedover time, each sequence representing a particular client. That is, at every timestep t, we receive the data Dt containing information about all the clients.

I A delayed class-feedback: the class label for each sample/client corresponds tothe client’s defaulting behavior in the following twelve months and thisinformation is therefore only available after a twelve month delay. Thus, theavailable data is a mixture of labeled and unlabeled samples.

I Concept drift: the domain exhibits a form of concept drift where the datadistribution as well as the set of feature variables relevant for classificationmay vary over time.

Objective: Explore the credit scoring problem based on a real-world data set.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 9

Page 10: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Challenges

I Classification in a streaming context: a stream of multiple sequences receivedover time, each sequence representing a particular client. That is, at every timestep t, we receive the data Dt containing information about all the clients.

I A delayed class-feedback: the class label for each sample/client corresponds tothe client’s defaulting behavior in the following twelve months and thisinformation is therefore only available after a twelve month delay. Thus, theavailable data is a mixture of labeled and unlabeled samples.

I Concept drift: the domain exhibits a form of concept drift where the datadistribution as well as the set of feature variables relevant for classificationmay vary over time.

Objective: Explore the credit scoring problem based on a real-world data set.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 10

Page 11: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Outline

1 Introduction

2 The financial data set

3 Risk prediction using dynamic Bayesian networks

4 Experimental results

5 Conclusion

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 11

Page 12: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

The financial data set

I Provided by a Spanish bank in the Almería region: Banco de CréditoCooperativo (BCC).

I It contains monthly aggregated information for a set of BCC clients for theperiod from April 2007 to March 2014.

I Only “active” clients are considered, meaning that we restrict our attention toindividuals between 18 and 65 years of age, who have at least one automaticbill payment or direct debit in the bank.

I BCC employees are excluded since they have special conditions.

I The resulting data set includes 50 000 clients each month.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 12

Page 13: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

The financial data set

I Provided by a Spanish bank in the Almería region: Banco de CréditoCooperativo (BCC).

I It contains monthly aggregated information for a set of BCC clients for theperiod from April 2007 to March 2014.

I Only “active” clients are considered, meaning that we restrict our attention toindividuals between 18 and 65 years of age, who have at least one automaticbill payment or direct debit in the bank.

I BCC employees are excluded since they have special conditions.

I The resulting data set includes 50 000 clients each month.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 13

Page 14: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

The financial data set

I Provided by a Spanish bank in the Almería region: Banco de CréditoCooperativo (BCC).

I It contains monthly aggregated information for a set of BCC clients for theperiod from April 2007 to March 2014.

I Only “active” clients are considered, meaning that we restrict our attention toindividuals between 18 and 65 years of age, who have at least one automaticbill payment or direct debit in the bank.

I BCC employees are excluded since they have special conditions.

I The resulting data set includes 50 000 clients each month.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 14

Page 15: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

The financial data set

I 44 feature variables, denoted Xt , where 11 variables describing the financialstatus of a client (VARXX) and 33 socio-demographic variables (SOCXX).

Variable ID Description Variable ID Description

VAR01 Total credit amount VAR07 Unpaid amount in mortgages

VAR02 Income VAR08 Unpaid amount in personal loans

VAR03 Expenses VAR09 Unpaid amount in credit cards

VAR04 Account balance VAR10 Unpaid amount in bank account deficit

VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products

VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables

I Each client u has an associated class variable C(u)t for each time step t that

indicates if that particular client will default during the following 12 months.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 15

Page 16: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

The financial data set

I 44 feature variables, denoted Xt , where 11 variables describing the financialstatus of a client (VARXX) and 33 socio-demographic variables (SOCXX).

Variable ID Description Variable ID Description

VAR01 Total credit amount VAR07 Unpaid amount in mortgages

VAR02 Income VAR08 Unpaid amount in personal loans

VAR03 Expenses VAR09 Unpaid amount in credit cards

VAR04 Account balance VAR10 Unpaid amount in bank account deficit

VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products

VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables

I Each client u has an associated class variable C(u)t for each time step t that

indicates if that particular client will default during the following 12 months.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 16

Page 17: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Outline

1 Introduction

2 The financial data set

3 Risk prediction using dynamic Bayesian networks

4 Experimental results

5 Conclusion

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 17

Page 18: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

I A dynamic probabilistic model for doing prediction in the BCC domain

I At time T (the current time), we predict the defaulting status (CT ) of aparticular client based on previous socio-economical observations and theclient’s known defaulting status λ = 12 months earlier.

tion for a set of BCC clients for the period from April 2007 to March 2014. Sincethe customer information is received on a monthly basis, we can consider thecredit scoring problem as a supervised classification problem within a streamingcontext. The problem does, however, also have some distinguishing characteristicsthat separate it from standard streaming problems: Firstly, instead of receivinga single sequence of data over time, we are faced with a stream of multiple se-quences, each sequence representing a particular client. That is, at every timestep t (which for the BCC data set corresponds to every month), we receive thedata Dt containing information about all the clients. Secondly, in a conventionalstreaming data setting, the classification model would typically be trained witha subset of the observations collected up to time t, which would afterwards beused for predicting the class values of new instances received at time t. This is,however, not applicable in the BCC setting, since the class label for each sam-ple/client corresponds to the client’s defaulting behavior in the following twelvemonths and this information is therefore only available after a twelve month delay.Thus, the available data is a mixture of labeled and unlabeled samples. Thirdly,the domain exhibits a form of concept drift [3], where the set of feature variablesrelevant for classification may vary from one month to the next. Although thesecharacteristics may at first seem as ad-hoc peculiarities of the BCC data set, theyin fact apply to most credit scoring problems as well as many other domains. Wewill discuss this issue further in Section 5, which also serves to demonstrate thebroader relevance of the above mentioned problems.

In this paper we present a first approach to address the BCC credit scoringproblem3 based on the use of a simple dynamic probabilistic graphical model [5].A rough visual description of this model is given in Figure 1. Our preliminaryapproach is implemented based on the AMIDST Toolbox4. This toolbox providesan e�cient implementation of approximate inference and learning methods forstreaming data using the Bayesian networks modeling framework [5] as well asvariational Bayes inference and learning procedures [6].

CT�12 CT�11

XT�11

CT�10

XT�10

CT�1

XT�1

CT

XT

Figure 1. A dynamic probabilistic model for doing prediction in the BCC domain. At time T

(assumed to be the current time) we wish to predict the defaulting status (CT ) of a particular

customer based on previous socio-economical observations as well as the customer’s knowndefaulting status � = 12 months earlier. Note that due to the independence assumptions in the

model, XT�12 and all observations prior to T � 12 become irrelevant, and are therefore not

shown. Square/Round boxes indicate data which is available/non-available when predicting thedefaulting status of the clients at month T .

3The presented models are not related to the current scoring models implemented in BCC.4AMIDST is an open source toolbox available at http://amidst.github.io/toolbox/ under

the Apache Software License version 2.0.

Figure 1: Square/Round boxes indicate data which is available/non-availablewhen predicting the defaulting status of the clients at month T .

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 18

Page 19: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

I A 2-time-slices Dynamic Naïve Bayes classifier

X1,t−1

Ct−1 Ct

."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t

I It assumes that only the class variables are connected across time and that allthe predictive variables at time step t are conditionally independent given theclass variable at time t.

I The joint probability factorizes as

p(c1:T , x1:T ) =T∏t=1

p(ct |ct−1)n∏

i=1

p (xi ,t |ct) ·

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 19

Page 20: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

I A 2-time-slices Dynamic Naïve Bayes classifier

X1,t−1

Ct−1 Ct

."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t

I It assumes that only the class variables are connected across time and that allthe predictive variables at time step t are conditionally independent given theclass variable at time t.

I The joint probability factorizes as

p(c1:T , x1:T ) =T∏t=1

p(ct |ct−1)n∏

i=1

p (xi ,t |ct) ·

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 20

Page 21: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

I A 2-time-slices Dynamic Naïve Bayes classifier

X1,t−1

Ct−1 Ct

."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t

I It assumes that only the class variables are connected across time and that allthe predictive variables at time step t are conditionally independent given theclass variable at time t.

I The joint probability factorizes as

p(c1:T , x1:T ) =T∏t=1

p(ct |ct−1)n∏

i=1

p (xi ,t |ct) ·

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 21

Page 22: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

Learning the model

I Bayesian approach for multinomial and normally distributed data.

I p (xi,t |ct) are learned from the labeled data DT−λ.

I p(ct |ct−1) are learned using the class transitions from DT−λ−1 to DT−λ.

PredictionI It amounts to calculating the conditional probability for the class label for each

client u at time T given all the information collected so far, D1:T .

p(c(u)t |x

(u)t−λ+1:t , c

(u)t−λ

)∝ p

(x(u)t |c

(u)t

)∑c(u)t−1

p(c(u)t |c

(u)t−1

)p(c(u)t−1|x

(u)t−λ+1:t−1, c

(u)t−λ

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 22

Page 23: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

Learning the model

I Bayesian approach for multinomial and normally distributed data.

I p (xi,t |ct) are learned from the labeled data DT−λ.

I p(ct |ct−1) are learned using the class transitions from DT−λ−1 to DT−λ.

PredictionI It amounts to calculating the conditional probability for the class label for each

client u at time T given all the information collected so far, D1:T .

p(c(u)t |x

(u)t−λ+1:t , c

(u)t−λ

)∝ p

(x(u)t |c

(u)t

)∑c(u)t−1

p(c(u)t |c

(u)t−1

)p(c(u)t−1|x

(u)t−λ+1:t−1, c

(u)t−λ

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 23

Page 24: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

Feature subset selection

I The relevance of the variables may vary over time.

I We consider a wrapper feature selection method with the Naïve Bayes modelas the base classifier combined with greedy search.

I The area under the curve (AUC) was used as the objective function, becauseAUC usually performs well even if the data has class imbalance.

I In our case, the feature selection method is performed at each time step toinfer which variables are helpful in separating defaulters from non-defaulters.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 24

Page 25: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

Feature subset selection

I The relevance of the variables may vary over time.

I We consider a wrapper feature selection method with the Naïve Bayes modelas the base classifier combined with greedy search.

I The area under the curve (AUC) was used as the objective function, becauseAUC usually performs well even if the data has class imbalance.

I In our case, the feature selection method is performed at each time step toinfer which variables are helpful in separating defaulters from non-defaulters.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 25

Page 26: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

Feature subset selection

I The relevance of the variables may vary over time.

I We consider a wrapper feature selection method with the Naïve Bayes modelas the base classifier combined with greedy search.

I The area under the curve (AUC) was used as the objective function, becauseAUC usually performs well even if the data has class imbalance.

I In our case, the feature selection method is performed at each time step toinfer which variables are helpful in separating defaulters from non-defaulters.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 26

Page 27: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Dynamic Bayesian classifiers

Feature subset selection

I The relevance of the variables may vary over time.

I We consider a wrapper feature selection method with the Naïve Bayes modelas the base classifier combined with greedy search.

I The area under the curve (AUC) was used as the objective function, becauseAUC usually performs well even if the data has class imbalance.

I In our case, the feature selection method is performed at each time step toinfer which variables are helpful in separating defaulters from non-defaulters.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 27

Page 28: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Outline

1 Introduction

2 The financial data set

3 Risk prediction using dynamic Bayesian networks

4 Experimental results

5 Conclusion

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 28

Page 29: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

AMIDST toolbox

I Open source Java toolbox http://amidst.github.io/toolbox/

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 29

Page 30: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Predictive performance analysis

I The feature subset selection helps to improve the value of the AUC.

I The AUC value increases over time: the problem becomes easier to solve.0.

650.

700.

750.

800.

850.

900.

951.

00

AUC

Dynamic NB with FS

Dynamic NB

May

200

8Ju

l 200

8S

ep 2

008

Nov

200

8Ja

n 20

09M

ar 2

009

May

200

9Ju

l 200

9S

ep 2

009

Nov

200

9Ja

n 20

10M

ar 2

010

May

201

0Ju

l 201

0S

ep 2

010

Nov

201

0Ja

n 20

11M

ar 2

011

May

201

1Ju

l 201

1S

ep 2

011

Nov

201

1Ja

n 20

12M

ar 2

012

May

201

2Ju

l 201

2S

ep 2

012

Nov

201

2Ja

n 20

13M

ar 2

013

May

201

3Ju

l 201

3S

ep 2

013

Nov

201

3Ja

n 20

14M

ar 2

014

Figure 2: AUC results for the Dynamic Naive Bayes (NB) classifier with andwithout feature selection (FS).

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 30

Page 31: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Analysis of relevant features

I In general, the sociodemographic features play a minor role in terms ofpredictive performance.

VAR01VAR02VAR04VAR05VAR06VAR07VAR08VAR09VAR10VAR11SOC01SOC02SOC03SOC05SOC06SOC07SOC10SOC11SOC12SOC14SOC16SOC17SOC18SOC20SOC22SOC26SOC28SOC31

May

200

7Ju

l 200

7S

ep 2

007

Nov

200

7Ja

n 20

08M

ar 2

008

May

200

8Ju

l 200

8S

ep 2

008

Nov

200

8Ja

n 20

09M

ar 2

009

May

200

9Ju

l 200

9S

ep 2

009

Nov

200

9Ja

n 20

10M

ar 2

010

May

201

0Ju

l 201

0S

ep 2

010

Nov

201

0Ja

n 20

11M

ar 2

011

May

201

1Ju

l 201

1S

ep 2

011

Nov

201

1Ja

n 20

12M

ar 2

012

May

201

2Ju

l 201

2S

ep 2

012

Nov

201

2Ja

n 20

13M

ar 2

013

Figure 3: The set of selected features throughout the months.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 31

Page 32: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Analysis of relevant features

I The most frequently selected variables consistently separate the two types ofclients, such as VAR04 and VAR08.

−0

.3−

0.2

−0

.10

.00

.1

VAR04

Ju

n 2

00

7

Se

p 2

00

7

De

c 2

00

7

Ma

r 2

00

8

Ju

n 2

00

8

Se

p 2

00

8

De

c 2

00

8

Ma

r 2

00

9

Ju

n 2

00

9

Se

p 2

00

9

De

c 2

00

9

Ma

r 2

01

0

Ju

n 2

01

0

Se

p 2

01

0

De

c 2

01

0

Ma

r 2

01

1

Ju

n 2

01

1

Se

p 2

01

1

De

c 2

01

1

Ma

r 2

01

2

Ju

n 2

01

2

Se

p 2

01

2

De

c 2

01

2

Ma

r 2

01

3

Ju

n 2

01

3

Se

p 2

01

3

De

c 2

01

3

Ma

r 2

01

4

Non−defaulting

Defaulting

0.0

0.5

1.0

1.5

2.0

2.5

VAR08

Ju

n 2

00

7

Se

p 2

00

7

De

c 2

00

7

Ma

r 2

00

8

Ju

n 2

00

8

Se

p 2

00

8

De

c 2

00

8

Ma

r 2

00

9

Ju

n 2

00

9

Se

p 2

00

9

De

c 2

00

9

Ma

r 2

01

0

Ju

n 2

01

0

Se

p 2

01

0

De

c 2

01

0

Ma

r 2

01

1

Ju

n 2

01

1

Se

p 2

01

1

De

c 2

01

1

Ma

r 2

01

2

Ju

n 2

01

2

Se

p 2

01

2

De

c 2

01

2

Ma

r 2

01

3

Ju

n 2

01

3

Se

p 2

01

3

De

c 2

01

3

Ma

r 2

01

4

Non−defaulting

Defaulting

Figure 4: Time-dependent averages of variables VAR04 (“Account balance”)and VAR08 (“Unpaid amount in personal loans”) for non-defaulting anddefaulting clients.

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 32

Page 33: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Outline

1 Introduction

2 The financial data set

3 Risk prediction using dynamic Bayesian networks

4 Experimental results

5 Conclusion

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 33

Page 34: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Conclusion

I A first step towards analyzing risk prediction in credit operations for the bankBanco de Crédito Cooperativo.

I A dynamic Naïve Bayes classifier with a wrapper feature subset selection.

I The feature subset selection helps to improve the results and gives insight intowhich attributes are most relevant as a function of time.

I The AMIDST toolbox performs inference and learning under a Bayesianframework and provides functionality to improve the presented model

I Use of more expressive network structures

I Extend the feature subset selection method to take the set of selectedfeatures from the previous time-steps into account

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 34

Page 35: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Conclusion

I A first step towards analyzing risk prediction in credit operations for the bankBanco de Crédito Cooperativo.

I A dynamic Naïve Bayes classifier with a wrapper feature subset selection.

I The feature subset selection helps to improve the results and gives insight intowhich attributes are most relevant as a function of time.

I The AMIDST toolbox performs inference and learning under a Bayesianframework and provides functionality to improve the presented model

I Use of more expressive network structures

I Extend the feature subset selection method to take the set of selectedfeatures from the previous time-steps into account

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 35

Page 36: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Conclusion

I A first step towards analyzing risk prediction in credit operations for the bankBanco de Crédito Cooperativo.

I A dynamic Naïve Bayes classifier with a wrapper feature subset selection.

I The feature subset selection helps to improve the results and gives insight intowhich attributes are most relevant as a function of time.

I The AMIDST toolbox performs inference and learning under a Bayesianframework and provides functionality to improve the presented model

I Use of more expressive network structures

I Extend the feature subset selection method to take the set of selectedfeatures from the previous time-steps into account

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 36

Page 37: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Conclusion

I A first step towards analyzing risk prediction in credit operations for the bankBanco de Crédito Cooperativo.

I A dynamic Naïve Bayes classifier with a wrapper feature subset selection.

I The feature subset selection helps to improve the results and gives insight intowhich attributes are most relevant as a function of time.

I The AMIDST toolbox performs inference and learning under a Bayesianframework and provides functionality to improve the presented model

I Use of more expressive network structures

I Extend the feature subset selection method to take the set of selectedfeatures from the previous time-steps into account

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 37

Page 38: Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Thank you for your attention

Questions?

Acknowledgments: This project has received funding from the European Union’sSeventh Framework Programme for research, technological development and

demonstration under grant agreement no 619209

Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 38