A Financial Approach to Machine Learning with Applications to Credit Risk Craig Friedman...

A Financial Approach to Machine Learning with Applications to Credit Risk

Craig Friedman (craig.friedman@cims.nyu.edu)

• Joint work with Sven Sandow; thanks to Bob Cangemi, Peter Chang and James Huang

• Introduction

• Measuring Probabilistic Model Performance from an investor’s perspective

• Building Probabilistic Models for use by an investor

• The Maximum Expected Utility Ultimate Recovery Model

• Conclusion

Introduction2 Credit Modeling Problems

1) A Probability of Default Problem:

Find prob(default|x)

Introduction2 Credit Modeling Problems

2) A Recovery Distribution Problem:

Find pdf(recovery|x)

IntroductionOur Main Goal

• Find good models

– To do so, we must have a way to measure model performance

– The models will be used by investors to make investment decisions—performance should be measured accordingly

Performance Measures

• Our Paradigm

• Information Theoretic Interpretation

• An Important Class of Utility Functions

Performance MeasuresOur Paradigm

Our Model Performance Measures are – Natural Extensions of the Axioms of Utility Theory– Familiar, in important special cases– Enterprise-wide

• PD, late payment, etc.• Recovery, dilution, aggregate default rate distribution,

etc.• Multi-Horizon PD• Default correlation modeling• others

– Consistent with our approach to Model Formulation

• Assumptions:– Investor with utility function– Market with odds ratio for each state (AAA bonds cost

more than CCC bonds!)– Investor believes model and invests to maximize

expected utility (a consequence of Utility Theory)

• Paradigm: We base our model performance measure on an (out of sample) estimate of expected utility.

• Accurate models allow for effective investment strategies

• Inaccurate models induce over-betting and under-betting

• Our performance measures have financial interpretation

• Given a benchmark model, we can construct a relative performance measure based on our paradigm

• The benchmark model can be– An industry standard model – The “non-informative model”

• The non-informative model is so simple that we can construct a single relative performance measure, without the effort of building a complex benchmark model.

Investor has utility function U(W),

Performance Measures Information Theoretic Interpretation

• Entropy is a measure of the uncertainty of a random variable

• High Entropy Prob Measure Low Entropy Prob Measure

H=log(10) H=0

• Kullback-Leibler Relative Entropy is a measure of the discrepancy from one probability measure to another

• Large Discrepancy Small Discrepancy

D(p||q)=log(10) D(p||q) is approximately 0

• Entropy is the difference between fantasy and optimality

• By analogy

• We define the Generalized Relative Entropy (GRE), a measure of discrepancy between probability measures

• GRE is convex in p and non-negative. GRE is zero if and only if p=q

• By putting U(W)=log(W) we recover entropy, Kullback-Leibler relative entropy

• We have the information theoretic interpretations

Performance Measures Important Class of Utility Functions

• Often, we don’t know/trust the odds ratios

• Note that for U(W)=log(W)

• Difference in expected utility

• Estimated wealth growth rate pickup (for a certain type of investor) who uses model 2 rather than model 1

• Logarithm of likelihood ratio (deviance, Akaike Information Content)

• Performance measure that generates an optimal (in the sense of the Neyman-Pearson Lemma) decision surface.

• Difference between– relative entropy from empirical probs to model 1 probs– relative entropy from empirical probs to model 2 probs

• Error term from using the approximation

is two orders of magnitude smaller than the deviation from homogeneous expected returns.

Performance Measures:An Important Class of Utilities

• Morningstar uses the power utility with power 2.

• This is how a member of our family approximates Morningstar’s utility function

Performance Measures pdf(y),prob(Y=y|x),pdf(y|x)

• Please see the paper

• There are a few twists, but the results are basically the same.

• For extension of these ideas to measure regression model performance, please see

Maximum Expected Utility ModelsIntroduction

• Maximize Model performance measures relevant for an INVESTOR who relies on the models to make INVESTMENT DECISIONS

• Are flexible enough to accurately reflect the data

• Do Not Over-fit the data

• We learn/build models based on a coherent model learning theory specifically designed for investors

Maximum Expected Utility ModelsIntroduction

• Balance – Consistency with the Data– Consistency with Prior Beliefs

• Result: a 1 hyperparameter family of models, each of which is associated with a a given level of consistency with the data. Each model– is asymptotically maximizes expected utility over a potentially rich

family of models– is robust: maximizing outperformance of benchmark model under

most adverse true measure (more later).

• Choose optimal hyperparameter value by maximizing expected utility on an out of sample data set.

• In this talk, we discuss our approach in the simplest setting: Discrete Probability Models.

Maximum Expected Utility ModelsFormulation

• Model feature means are deterministic quantities

• Sample feature means are observations of a random vector

• Central Limit Theorem: random vector has Gaussian distribution(asymptotically)

• Equally consistent model measures lie on the level sets of this Gaussian

Maximum Expected Utility ModelsFormulation

• We define the notion of Dominance (of one model measure over another).

Maximum Expected Utility Models Formulation

Primal Problem

Robustness

Maximum Expected Utility Models Dual Problem

•Which is familiar, see, for example

Maximum Expected Utility Models Summary of Approach

Maximum Expected Utility Models Losing the O’s

Maximum Expected Utility Models More General Context

Maximum Expected Utility ModelsApplications and Performance

• We can use the same methodology to model conditional– Default Probabilities (Friedman and Huang, 2003) – Recovery Rate Distributions (Friedman and Sandow, 2003)– Aggregate Default Rate Distributions (Sandow, et al, 2003)– Late Payment Probabilities– Default Time Densities– Dilution Distributions– Asset Price Distributions

US Public Firm Model Variablesin order of significance

• 1. net income (TTM10) / total assets,• 2. market cap (latest month),• 3. debt in current liabilities (latest year),• 4. total return (TTM),• 5. EBIT (latest quarter),• 6. GDP change (year over year),• 7. long-term debt (latest quarter),• 8. long-term debt (latest year),• 9. interest expense,• 10. total debt (latest quarter),

US Public Firm Model Variablesin order of significance

• 11. interest coverage before tax,• 12. total assets (latest quarter),• 13. net income (latest quarter),• 14. cost of goods sold (TTM),• 15. debt in current liabilities (latest quarter),• 16. volatility (TTM) / S&P 500 volatility (TTM),• 17. volatility (last 5 year’s monthly prices),• 18. accounts payable (latest quarter),• 19. total liabilities (latest quarter), and• 20. current assets (latest quarter).

Performance MeasurementExploring the Model

• Does model output distinguish defaulters from non-defaulters, for a cohort with the same rating? (Does the model output add information?)

Model UseSurveillance Alerts

Given a rating category, look for extreme (high or low PD’s)

A Surveillance Experiment:

1. On a given date (once per quarter) examine the 1% (about 1 firm per quarter) of the (usually fewer than 150) BBB- firms with the highest PDs, and the 1% (about 1 firm per quarter) of BBB- firms with the lowest PDs

2. Examine the performance of these firms over the following three years

3. Repeat quarterly over 24 quarters, resulting in• 24 high PD observations• 24 low PD observations

High PD firms

• 50% of the firms with the highest PDs were downgraded within 3 years (vs 22% of all BBB- firms in this study)

• 17% of the firms with the highest PDs defaulted within 3 years (vs 1.46% of all BBB- firms in this study)

• 0% of the firms with the highest PDs were upgraded within 3 years (vs 21% of all BBB- firms in this study)

Low PD firms

• 0% of the firms with the lowest PDs were downgraded within 3 years (vs 22% of all BBB- firms in this study)

• 0% of the firms with the lowest PDs defaulted within 3 years (vs 1.46% of all BBB- firms in this study)

• 62.50% of the firms with the lowest PDs were upgraded within 3 years (vs 21% of all BBB- firms in this study)

Recovery ModelMotivation

• Two major factors affect credit risk:– Probability of default (We model prob(default=1 given x).)– Probability distribution over recoveries given default (RGD)

• In the past: little modeling effort for RGD

• Best known model is Moody’s LossCalc :– Modeling of expected recovery one month after default and

confidence intervals– No explicit modeling of full probability distributions– No modeling of ultimate recovery

Recovery Modeling ApproachConditional probabilities

• p(r|x) – probability density of recovery rate r conditioned on vector x of explanatory variables, which are:– Collateral quality– Debt below class– Debt above class– Aggregate default rate– Others if necessary

• Recoveries are mostly between 0 and 1.2.

• Many defaults have complete or zero recovery.

Recovery Model Approach

• Maximum Expected Utility Model– Global features:

– Point features:

Recovery Model Performance

Model Delta

Simple Beta

Naïve

Generalized Beta

Maximum Expected Utility 0.63

• Model Performance Measure, Delta : gain in expected logarithmic utility (wealth growth rate) with respect to a non-informative model

Recovery Model Results:Probability density versus RGD and collateral

Recovery Model Results:Point probabilities versus collateral

Model Results:Probability density versus Spread and Maturity

Model Results:Probability density versus Spread and Rating

GUI Interface Screen Shots

Procedure

• We compute (using MEU methodology, as for the LossStats Model) the 1-year-horizon PD transition matrix, with 100 buckets, conditioned on survival for 1 year.

• We then compute the unconditional (on survival) transition matrix, using the above matrix and the 1-year-horizon PD and basic probability rules

• We compute the t-year transition matrix, under the Markov assumption, by raising the transition matrix to the power t (t not necessarily an integer)

• The t-year cumulative PD can be obtained from this t-year transition matrix

Plot of 1-year PD transition matrixconditioned on survival

• Next: Screenshots for unconditional (on survival) distribution of 1-year-PD after t years– t=1– t=5– t=10– t=50

Screenshot 1

Screenshot 2

Screenshot 3

Screenshot 4

Demo Program Interface

Model Results: Default Time Density when all explanatory variables are set to the medium value.

Model Results: Probability density, given default within 3 yrs versus Net Income/Total Assets

Model Results:Probability density, given default within 3 yrs versus Total Liabilities/Total Assets

Model Results: Probability density, given default within 3 yrs versus Relative

Model Results: Probability density, given default within 3 yrs versus Excess Return

Model Results: Probability density, given default within 3 yrs versus Volatility

References

References available on request:

craig_friedman@sandp.comsven_sandow@sandp.com

•C. Friedman and S. Sandow, ‘Model performance measures for expected utility maximizing investors’, International Journal of Applied and Theoretical Finance, Vol. 5, No 4, 2003, p.335

•C. Friedman and S. Sandow, ‘Learning probabilistic models: an expected utility maximization approach’, Journal of Machine Learning Research, Vol.4, 2003, p. 257

•C. Friedman and S. Sandow, ‘Ultimate recoveries’, Risk, August 2003, p. 69

A Financial Approach to Machine Learning with Applications to Credit Risk Craig Friedman...

Documents

Nicolò Friedman

Friedman, Milton Metodologia Da Economia Positiva Friedman

Friedman flashpoints

United States Fish and Wildlife Service...Roseburg, Or 97470 Craig Tuss, Field Supervisor Botanical Contact, Sam Friedman U.S. Fish and Wildlife Service Roseburg Field Office 2900

Milton Friedman

cims.nyu.edu · CONTENTS Abstracts .............................................................. xi Introduction .......................................................... xix R

Planetarity (Friedman)

Arecibo Observatory – 31 May 2013 LIDAR at Arecibo Observatory and The Puerto Rico Photonics Institute Jonathan Friedman Shikha Raizada Craig Tepley Raul

Statistik Friedman

Introducing Friedman

Friedman Transnationalization

Deconstructing Friedman

Betty Friedman

Free To Choose By Milton Friedman, Rose Friedman

Young Friedman

Building Resilience for Improved Governance and Performance Dr. Merle Friedman & Craig Taplin Director, Research & Development Chief Executive is now…

J. Craig Venter Institute: Genomics Research for the … · 10/19/2016 1 J. Craig Venter Institute: Genomics Research for the Benefit of Society Robert M. Friedman Vice President

Justin Friedman

UJI FRIEDMAN

Deep learning with Elastic Averaging SGDDeep learning with Elastic Averaging SGD Sixin Zhang Courant Institute, NYU zsx@cims.nyu.edu Anna Choromanska Courant Institute, NYU achoroma@cims.nyu.edu