A Financial Approach to Machine Learning with Applications to Credit Risk Craig Friedman...

Preview:

Citation preview

A Financial Approach to Machine Learning with Applications to Credit Risk

Craig Friedman (craig.friedman@cims.nyu.edu)

2

• Joint work with Sven Sandow; thanks to Bob Cangemi, Peter Chang and James Huang

3

• Introduction

• Measuring Probabilistic Model Performance from an investor’s perspective

• Building Probabilistic Models for use by an investor

• The Maximum Expected Utility Ultimate Recovery Model

• Conclusion

4

Introduction2 Credit Modeling Problems

1) A Probability of Default Problem:

Find prob(default|x)

5

Introduction2 Credit Modeling Problems

2) A Recovery Distribution Problem:

Find pdf(recovery|x)

6

IntroductionOur Main Goal

• Find good models

– To do so, we must have a way to measure model performance

– The models will be used by investors to make investment decisions—performance should be measured accordingly

7

Performance Measures

• Our Paradigm

• Information Theoretic Interpretation

• An Important Class of Utility Functions

8

Performance MeasuresOur Paradigm

Our Model Performance Measures are – Natural Extensions of the Axioms of Utility Theory– Familiar, in important special cases– Enterprise-wide

• PD, late payment, etc.• Recovery, dilution, aggregate default rate distribution,

etc.• Multi-Horizon PD• Default correlation modeling• others

– Consistent with our approach to Model Formulation

9

Performance MeasuresOur Paradigm

• Assumptions:– Investor with utility function– Market with odds ratio for each state (AAA bonds cost

more than CCC bonds!)– Investor believes model and invests to maximize

expected utility (a consequence of Utility Theory)

• Paradigm: We base our model performance measure on an (out of sample) estimate of expected utility.

• Accurate models allow for effective investment strategies

• Inaccurate models induce over-betting and under-betting

• Our performance measures have financial interpretation

10

Performance MeasuresOur Paradigm

• Given a benchmark model, we can construct a relative performance measure based on our paradigm

• The benchmark model can be– An industry standard model – The “non-informative model”

• The non-informative model is so simple that we can construct a single relative performance measure, without the effort of building a complex benchmark model.

11

Performance MeasuresOur Paradigm

Investor has utility function U(W),

12

Performance MeasuresOur Paradigm

13

Performance MeasuresOur Paradigm

14

Performance Measures Information Theoretic Interpretation

• Entropy is a measure of the uncertainty of a random variable

• High Entropy Prob Measure Low Entropy Prob Measure

H=log(10) H=0

15

Performance Measures Information Theoretic Interpretation

• Kullback-Leibler Relative Entropy is a measure of the discrepancy from one probability measure to another

• Large Discrepancy Small Discrepancy

D(p||q)=log(10) D(p||q) is approximately 0

16

Performance Measures Information Theoretic Interpretation

• Entropy is the difference between fantasy and optimality

• By analogy

17

Performance Measures Information Theoretic Interpretation

• We define the Generalized Relative Entropy (GRE), a measure of discrepancy between probability measures

• GRE is convex in p and non-negative. GRE is zero if and only if p=q

18

Performance Measures Information Theoretic Interpretation

• By putting U(W)=log(W) we recover entropy, Kullback-Leibler relative entropy

• We have the information theoretic interpretations

19

Performance Measures Important Class of Utility Functions

• Often, we don’t know/trust the odds ratios

• Note that for U(W)=log(W)

20

Performance Measures Important Class of Utility Functions

21

Performance Measures Important Class of Utility Functions

22

Performance Measures Important Class of Utility Functions

• Difference in expected utility

• Estimated wealth growth rate pickup (for a certain type of investor) who uses model 2 rather than model 1

• Logarithm of likelihood ratio (deviance, Akaike Information Content)

• Performance measure that generates an optimal (in the sense of the Neyman-Pearson Lemma) decision surface.

• Difference between– relative entropy from empirical probs to model 1 probs– relative entropy from empirical probs to model 2 probs

23

Performance Measures Important Class of Utility Functions

• Error term from using the approximation

is two orders of magnitude smaller than the deviation from homogeneous expected returns.

24

Performance Measures:An Important Class of Utilities

• Morningstar uses the power utility with power 2.

• This is how a member of our family approximates Morningstar’s utility function

25

Performance Measures pdf(y),prob(Y=y|x),pdf(y|x)

• Please see the paper

• There are a few twists, but the results are basically the same.

• For extension of these ideas to measure regression model performance, please see

26

Maximum Expected Utility ModelsIntroduction

• Maximize Model performance measures relevant for an INVESTOR who relies on the models to make INVESTMENT DECISIONS

• Are flexible enough to accurately reflect the data

• Do Not Over-fit the data

• We learn/build models based on a coherent model learning theory specifically designed for investors

27

Maximum Expected Utility ModelsIntroduction

• Balance – Consistency with the Data– Consistency with Prior Beliefs

• Result: a 1 hyperparameter family of models, each of which is associated with a a given level of consistency with the data. Each model– is asymptotically maximizes expected utility over a potentially rich

family of models– is robust: maximizing outperformance of benchmark model under

most adverse true measure (more later).

• Choose optimal hyperparameter value by maximizing expected utility on an out of sample data set.

• In this talk, we discuss our approach in the simplest setting: Discrete Probability Models.

28

Maximum Expected Utility ModelsFormulation

• Model feature means are deterministic quantities

• Sample feature means are observations of a random vector

• Central Limit Theorem: random vector has Gaussian distribution(asymptotically)

• Equally consistent model measures lie on the level sets of this Gaussian

29

Maximum Expected Utility ModelsFormulation

30

Maximum Expected Utility ModelsFormulation

• We define the notion of Dominance (of one model measure over another).

31

Maximum Expected Utility Models Formulation

32

Maximum Expected Utility Models Formulation

Primal Problem

33

Maximum Expected Utility Models Formulation

Robustness

34

Maximum Expected Utility Models Formulation

35

Maximum Expected Utility Models Formulation

36

Maximum Expected Utility Models Dual Problem

37

Maximum Expected Utility Models Dual Problem

•Which is familiar, see, for example

38

Maximum Expected Utility Models Summary of Approach

39

Maximum Expected Utility Models Losing the O’s

40

Maximum Expected Utility Models More General Context

41

Maximum Expected Utility ModelsApplications and Performance

• We can use the same methodology to model conditional– Default Probabilities (Friedman and Huang, 2003) – Recovery Rate Distributions (Friedman and Sandow, 2003)– Aggregate Default Rate Distributions (Sandow, et al, 2003)– Late Payment Probabilities– Default Time Densities– Dilution Distributions– Asset Price Distributions

42

US Public Firm Model Variablesin order of significance

• 1. net income (TTM10) / total assets,• 2. market cap (latest month),• 3. debt in current liabilities (latest year),• 4. total return (TTM),• 5. EBIT (latest quarter),• 6. GDP change (year over year),• 7. long-term debt (latest quarter),• 8. long-term debt (latest year),• 9. interest expense,• 10. total debt (latest quarter),

43

US Public Firm Model Variablesin order of significance

• 11. interest coverage before tax,• 12. total assets (latest quarter),• 13. net income (latest quarter),• 14. cost of goods sold (TTM),• 15. debt in current liabilities (latest quarter),• 16. volatility (TTM) / S&P 500 volatility (TTM),• 17. volatility (last 5 year’s monthly prices),• 18. accounts payable (latest quarter),• 19. total liabilities (latest quarter), and• 20. current assets (latest quarter).

44

Performance MeasurementExploring the Model

• Does model output distinguish defaulters from non-defaulters, for a cohort with the same rating? (Does the model output add information?)

45

Model UseSurveillance Alerts

Given a rating category, look for extreme (high or low PD’s)

A Surveillance Experiment:

1. On a given date (once per quarter) examine the 1% (about 1 firm per quarter) of the (usually fewer than 150) BBB- firms with the highest PDs, and the 1% (about 1 firm per quarter) of BBB- firms with the lowest PDs

2. Examine the performance of these firms over the following three years

3. Repeat quarterly over 24 quarters, resulting in• 24 high PD observations• 24 low PD observations

46

Model UseSurveillance Alerts

High PD firms

• 50% of the firms with the highest PDs were downgraded within 3 years (vs 22% of all BBB- firms in this study)

• 17% of the firms with the highest PDs defaulted within 3 years (vs 1.46% of all BBB- firms in this study)

• 0% of the firms with the highest PDs were upgraded within 3 years (vs 21% of all BBB- firms in this study)

47

Model UseSurveillance Alerts

Low PD firms

• 0% of the firms with the lowest PDs were downgraded within 3 years (vs 22% of all BBB- firms in this study)

• 0% of the firms with the lowest PDs defaulted within 3 years (vs 1.46% of all BBB- firms in this study)

• 62.50% of the firms with the lowest PDs were upgraded within 3 years (vs 21% of all BBB- firms in this study)

48

Recovery ModelMotivation

• Two major factors affect credit risk:– Probability of default (We model prob(default=1 given x).)– Probability distribution over recoveries given default (RGD)

• In the past: little modeling effort for RGD

• Best known model is Moody’s LossCalc :– Modeling of expected recovery one month after default and

confidence intervals– No explicit modeling of full probability distributions– No modeling of ultimate recovery

TM

49

Recovery Modeling ApproachConditional probabilities

• p(r|x) – probability density of recovery rate r conditioned on vector x of explanatory variables, which are:– Collateral quality– Debt below class– Debt above class– Aggregate default rate– Others if necessary

• Recoveries are mostly between 0 and 1.2.

• Many defaults have complete or zero recovery.

50

Recovery Model Approach

• Maximum Expected Utility Model– Global features:

– Point features:

51

Recovery Model Performance

Model Delta

Simple Beta

Naïve

Generalized Beta

0.07

0.25

0.47

Maximum Expected Utility 0.63

• Model Performance Measure, Delta : gain in expected logarithmic utility (wealth growth rate) with respect to a non-informative model

52

Recovery Model Results:Probability density versus RGD and collateral

53

Recovery Model Results:Probability density versus RGD and collateral

54

Recovery Model Results:Point probabilities versus collateral

55

Model Results:Probability density versus Spread and Maturity

56

Model Results:Probability density versus Spread and Rating

57

GUI Interface Screen Shots

58

GUI Interface Screen Shots

59

Procedure

• We compute (using MEU methodology, as for the LossStats Model) the 1-year-horizon PD transition matrix, with 100 buckets, conditioned on survival for 1 year.

• We then compute the unconditional (on survival) transition matrix, using the above matrix and the 1-year-horizon PD and basic probability rules

• We compute the t-year transition matrix, under the Markov assumption, by raising the transition matrix to the power t (t not necessarily an integer)

• The t-year cumulative PD can be obtained from this t-year transition matrix

60

Plot of 1-year PD transition matrixconditioned on survival

61

• Next: Screenshots for unconditional (on survival) distribution of 1-year-PD after t years– t=1– t=5– t=10– t=50

62

Screenshot 1

63

Screenshot 2

64

Screenshot 3

65

Screenshot 4

66

Demo Program Interface

67

Model Results: Default Time Density when all explanatory variables are set to the medium value.

68

Model Results: Probability density, given default within 3 yrs versus Net Income/Total Assets

69

Model Results:Probability density, given default within 3 yrs versus Total Liabilities/Total Assets

70

Model Results: Probability density, given default within 3 yrs versus Relative

71

Model Results: Probability density, given default within 3 yrs versus Excess Return

72

Model Results: Probability density, given default within 3 yrs versus Volatility

73

References

References available on request:

craig_friedman@sandp.comsven_sandow@sandp.com

•C. Friedman and S. Sandow, ‘Model performance measures for expected utility maximizing investors’, International Journal of Applied and Theoretical Finance, Vol. 5, No 4, 2003, p.335

•C. Friedman and S. Sandow, ‘Learning probabilistic models: an expected utility maximization approach’, Journal of Machine Learning Research, Vol.4, 2003, p. 257

•C. Friedman and S. Sandow, ‘Ultimate recoveries’, Risk, August 2003, p. 69

Recommended