Nov. 5-6, 2018 Chicago, IL...Equity-Based Insurance Guarantees Conference Nov. 5-6, 2018 Chicago, IL The Use of Artificial Intelligence and Machine Learning for Hedge Program Rebalancing

Equity-Based Insurance Guarantees Conference Nov. 5-6, 2018

Chicago, IL

The Use of Artificial Intelligence and Machine Learning for Hedge Program Rebalancing

Peter M. Phillips

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Sponsored by

https://www.soa.org/legal/antitrust-disclaimer/

https://www.soa.org/legal/presentation-disclaimer/

Prepared by PathWiseTM Solutions GroupProprietary & Confidential

The Use of Artificial Intelligence and Machine Learning for Hedge Program RebalancingPeter M. Phillips

November 5, 2018 4:15-5:00 pm


Interesting Landscape

Artificial IntelligenceEarly artificial intelligence stirs excitement

Machine LearningMachine learning begins to flourish Deep Learning

Deep learning breakthroughs drive AI boom

1950’s 1960’s 1970’s 1980’s 1990’s 2000’s 2010’s

Since an early flush of optimism in the 1950s, smaller subsets of artificial intelligence – first machine learning, then deep learning, a subset of machine learning – have created ever larger disruption.


Three Overlapping Concepts

Artificial Intelligence – AITechniques that enable computers to mimic human intelligence using if-then rules, decision trees, logic, ML, and DL

Machine Learning – MLA subset of AI that includes the use of advanced statistical techniques to enable machines to improve at tasks with experience and includes DL

Deep Learning DLA subset of ML that permits software to train itself to perform tasks, like speech and image recognition, by exposing multilayered neural networks to vast amounts of data

Source: Geospatial World


Practical Examples

Source: Cargroup.org

Artificial IntelligenceAI

Machine LearningML

Deep LearningDL

IBM Deep Blue Chess Program IBM Watson Alpha Go

Electronic Game Characters(Sims) Google Search Algorithm Natural Speech

Recognition

Early SPAM filters Amazon/Netflix Recommendations

Automated Driving Systems

Industrial Robotics Video Surveillance Voice Activated Assistants


Machine Learning Applications Across Industries

Predictive maintenance or condition monitoring

Warranty reserve estimation

Propensity to buy

Demand forecasting

Process optimization

Telematics

Predictive inventory planning

Recommendation engines

Upsell and cross-channel marketing

Market segmentation and targeting

Customer ROI and lifetime value

Alerts and diagnostics from

real-time patient data

Disease identification and riskstratification

Patient triage optimization

Proactive health management

Healthcare provider sentiment analysis

Aircraft scheduling

Dynamic pricing

Social media – consumer feedback and interaction analysis

Customer complaint resolution

Traffic patterns and congestion management

Risk analytics and regulation

Custom Segmentation

Cross-selling and up-selling

Sales and marketing

campaign management

Credit worthiness evaluation

Power usage analytics

Seismic data processing

Carbon emissions and trading

Customer-specific pricing

Smart grid management

Enemy demand and supply

optimization

Manufacturing Retail Healthcare and Life Science

Travel and Hospitality Financial Services Energy, feedstock and Utilities

Source: whatsthebigdata.com


Machine Learning Impact: Use Cases vs Industries

Machine learning has great potential across industries and use case types

Problem type Auto

mot

ive

Man

ufac

turin

g

Con

sum

er

Fina

nce

Agric

ultu

re

Ener

gy

Hea

lthca

re

Phar

ma-

ceut

ical

s

Publ

ic/ s

ocia

l

Med

ia

Tele

com

Tran

spor

t and

lo

gist

ics

Real-time optimization

Strategic optimization

Predictive analytics

Predictive maintenance

Radical personalization

Discovering new trends/anomalies

Forecasting

Processing unstructured data

Low HighImpact potential

Source: McKinsey Global Institute analysis


The Master Algorithm by Pedro Domingos

Symbolists Bayesians Connectionists Evolutionaries Analogizers

Idea Rules, logic, and symbols torepresent knowledge and to draw inferences

Focuses on assessing the likelihood of occurrence using inference

Generalize and recognize patterns dynamically with probabilistic weighted neurons

Generate variations and thenasses the fitness of each against a specific purpose

Optimize a function given set of constraints

Sample Algorithm

Rules and DecisionTrees and inverse deduction and production rule systems

Naïve Bayesor Markov Models, probabilistic inference

NeuralNetworks, backpropagationand DL

Genetic algorithms and evolutionary programming

Support Vector and Kernel Machines

Source: pwc.com/NextinTech


Decision Trees (1)

Description Use Cases Advantages Reference

Hierarchy of decision nodes will classify something based on series of questions

Credit risk assessment, prediction of horse races, etc

Useful when evaluating distinct features, qualities, characteristics of people, places, or things

D. T Larose, Data Mining and PredictiveAnalytics, 2nd edition, John Wiley and Sons, 2015



Support Vector Machines – SVMs (2)


Support vector machines classify groups of data using hyperplanes to determine boundaries

Categorization,handwriting recognition

Very good for binary classification and are useful whether the relationships between variables are linear or not

Matthew Kelly, Computer Science: Source, 2010



Regression (3)


Maps the behavior of a dependent variable relative to one or more independent variables, and many different approaches

Traffic flow analysis, email filtering

Useful for identifying continuousrelationships between variables

Giuseppe Bonaccorso, Machine Learning Algorithms, Packt Publishing, 2017



Naïve Bayes Classification (4)


Computes probabilities, given branches of possible outcomes to compute the combined conditional probabilities of multiple attributes where each feature is independent

Consumersegmentation, sentiment analysis, spam filtering, document classification

Quick classification of relevant items in small data sets that have distinct features

Rod Pierce, et al., MathIsFun, 2014



Hidden Markov Models (5)


Computes probabilities of hidden states occurring by analyzing observable data, and then estimating the likely pattern of future observations with the help of hidden state analysis

Facial expression analysis, weather prediction, speech recognition, malware detection

Tolerates data variability and is good at recognition and prediction

Leonardo Guizzetti, 2012



Random Forest (6)


Improves the accuracy of decision trees by using multiple trees with randomly selected subsets of data

Customer churn, risk assessment, cancer relapse risk

Prove themselves useful with large data sets and items that have numerous and sometimes irrelevant features

Nicolas Spies, WashingtonUniversity, 2015



Recurrent Neural Networks – RNNs (7)


Each neuron converts many inputs into a single output via one or more hidden layers. RNNs additionally pass values from step to step, creating a form of memory, allowing previous outputs to affect subsequent inputs

Image classification,captioning, political sentiment analysis

RNNs have predictive power when used with large amounts of sequenced information

Joseph Wilks, 2012



Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural networks (8)


Allow for both long-term and short-term memory, thereby avoiding gradient decay of values passed from step to step

Natural languageprocessing and translation

LTSMs and GRUs have same advantages as other RNNs and are more frequently used than RNNs because of their memory properties

G Orr, et al., Williamette University, 1999



Convolutional Neural Networks – CNNs (9)


Blends of weights from subsequent layer that are used to label the output layer

Image recognition, text to speech, drug discovery

Most useful with large data sets, with large number of features, and complex classification requirements

Algobeans, 2016



But it Gets Complicated…

Source: scientiscafe.com/2017/07/08/machinelearinggal


ML Laboratory: Delta Hedging

The Common Assumptions:– No arbitrage pricing– Risk-less hedging– Risk-neutral option valuation– Continuous-time limit—where risk disappears– No transaction costs– Lognormal stock prices, with interest rates and volatility fixed– PDEs

– Produces the Celebrated Black-Scholes Equation—simple and beautiful



The discrete time set up Log normal stock prices

Taylor series expansion for a call option in terms of changes in stock and time

Substituting discrete model of stock prices into the Taylor series expansion yields



Create a portfolio 𝜋𝜋 Consisting "of a call option held long a short position of delta shares in the underlying" over small time step

Change over small time step

Expression for hedging error (which includes initial investment required to set up the portfolio)

Substituting change over small step into the HE expression leaves

Simplify

HE is proportional to Gamma, and to the time step, and to the square of stock price, and to the volatility of the stock price

(#1)


Delta Hedging Error Over One Time Step

Experimental conditions: Stock =100, X=100, u=.05, Sig=.40, T=1/12 and r=.05 and N=50000 and time step=1 day

We are comparing simulated results with an analytical approximation from equation #1

Note the squared normal variable, which means it is chi-squared distribution with one degree of freedom, over one time step according to equation #1

P&L is highly skewed

There is small probability of making a lot of money and large probability of losing a small amount of money, because the option position is held long

Option writers face the flipside risk, with large probability of making a small profit and a small probability of having a large loss

#1


Delta Hedging Final P&L Equations

Now you can extend equation #1 to n time steps where the hedge is rebalanced, as the sum of hedging errors at each update

With more algebra (Kamal and Derman) we arrive at equation #2

With Gaussian integration we arrive at equation #3

Final P&L is directly proportional to the option’s vega and volatility and inversely proportional to the square root of the number of rebalances

(#2)

(#3)


Standard Deviation of Delta Hedging of the Final P&L

Experimental conditions: Stock =100, X=100, u=.05, Sig=.40, T=1/12 and r=.05 and N=50000 and dt=1 day

We are comparing simulated results with analytical approximations from equation #2 and #3

We are looking at the standard deviation of the final cumulative P&L

Note approximation #2 underestimates the standard deviation

Note approximation #3 overestimates at strikes close to the initial stock price and underestimates at strikes above the initial stock price

#2#3


Validation of ML Methods for Delta Prediction

Initial list of candidate models:

Non-parametric methods: K-Nearest Neighbors (kNN) Radius K-Nearest Neighbors

Generalized Linear Models: Linear Regression Linear least squares with L2 Lasso Regression Least Angle Regression Lasso with Least Angle

Regression Linear regression with combined

L1 and L2 regularization Bayesian ridge regression

Kernel methods: Epsilon-Support Vector Regression Nu Support Vector Regression Linear Support Vector Regression

Tree-based models: Decision Tree Regression Random Forest Regression

Non-linear models (Neural-Networks): Multi-layer Perceptron (MLP)


ML Model Training and Evaluation Process

This is not a regular case of ML model validation and training. We’ve generated synthetic dataset and computed BS delta as a target variable for each generated sample.On a fairly large dataset size (50K samples) we’ve performed validation of a couple of machine learning models.

ML model evaluation an training steps:1. Selected best performing models based on RMSE and MAPE metrics, using

a special case of validation dataset.2. Random Forest Regression were used for representative dataset size

selection (see Data validation description).3. Obtained a minimum representative dataset size, N = 1000000.4. Trained final models Random Forest Regression and MLP on 1M dataset.5. Tested on a special-case validation dataset (see Cross validation

description and results)


Validation of ML Methods for Delta Prediction

The best performers based on cross-validation: K-Nearest Neighbors Random Forest Regression Multi-layer Perceptron (Neural-Network)

Name RMSE (std) MAPE (std)

K-Nearest Neighbors 0.0305 (0.0020) 0.0162 (0.0001)

Random Forest Regression 0.0214 (0.0010) 0.0109 (0.0001)

Multi-layer Perceptron (Neural-Network) 0.0307 (0.0020) 0.0223 (0.0030)

Results of validation (50K dataset):


Pros and Cons of Each Model

Estimator Pros Cons

K-Nearest Neighbors No training required, simple algorithm, interpretable results

Computationally expensive; Slow on inference

Random Forest Regression High accuracy, interpretable resultsPoor interpolation/extrapolation of unseen data; Might be slow on inference; High artifacts size

Multi-layer Perceptron (Neural-Network)

Fast forward pass calculation (inference); Huge learning capacity

“Data-hungry”; Difficult hyper-parameter optimization; Expensive training of deep topologies


Synthetic Dataset Generation Step

We are using the following variables ranges for data simulation:

1. The Price and Delta of the European Call Option in the Black-Scholes world depends on: Current price of the underlying share (Index price) Strike price Volatility of the underlying share Risk free interest rate Time-to-maturity of the option

2. To build the ML model, we should generate a representative sample dataset which correlates these parameters with the call option Delta calculated using Black-Scholes equation.

3. It is required that this dataset should cover all possible combinations of the parameters and should not contain gaps or clusters to prevent the large biases of the ML predicted results.

4. The straightforward way is to use a fine 5-dimensional uniform grid, but it needs a lot of points to be evaluated and this approach is not effective.

5. The most effective way is to use quasi-random low discrepancy sequences like Sobol which performs better because it is more sparse. Therefore, we have generated 5-dimensional Sobol sequence and mapped it to the hyper rectangular volume described in the table.

Index price [0.0, 200.0]

Strike price [0.0, 200.0]

Volatility [0.0, 1.0]

Risk free interest rate [0.0, 1.0]

Time to maturity [0.0, 1.0]


Estimation of the Representative Dataset Size

N – size of the sample datasetK – number of datasets per iteration

N1: [10000, 25000, 50000, 250000, 1M] – list of N’s to be testedK = 10

Use Random Forest Regression to select representative dataset size.1. Starting with list of N’s (N1) and K=10.2. Generate K different datasets of size N3. Train the model on first dataset4. Validate on the rest K-1 datasets (9 remaining datasets), each of size N.5. Choose next value of N from the N1 list, repeat 2-46. Check prediction error convergence.7. If the error converged – assume we have reached model capacity (the model is not able to learn more information).



Baseline model capacity (Random Forest, n_estimators = 100 )

We can observe that standard-deviation of the model prediction error on K-1 datasets is decreasing with the increase of the dataset size (N). The mean error also decreases, as expected – the model is learning.

We want standard-deviation to converge with increase of dataset size N – to describe the population variance from the sample dataset. But, on the other hand we might reach the estimator learning capacity limit earlier.

That’s why…



N – size of the sample datasetK – number of datasets per iteration

N2: [250000, 1M, 1.5M, 2M] – list of N’s to be testedK = 10

…we perform the next steps:8. Increase model capacity - increase number of estimators for Random Forest

Regression from 100 to 150 (this is similar to increasing number and/or size of hidden layers of a Neural-Network).

9. Use bigger dataset sizes – N2 list.10.Repeat steps 2-4 until convergence.



Increased model capacity by 50% (Random Forest, n_estimators = 150)

As we can observe from the plots below – the standard deviation of the model error converged near dataset size of 1M samples.


Model Validation

We used different random number generators for Data validation and Model validation. For training - quasi-random Sobol algorithm.For validation - Mersenne Twister pseudorandom number generator with random seed for each validation fold.

The aim of using different random number generators for training an validation sets is to test the model on unseen data. Similar to training and test sets in case regular ML model training process.

Sobol is a low discrepancy sequence (LDS) sampling strategy, based on the deterministic placement of sample points as uniformly as possible, avoiding large gaps or clusters. Which is exactly what’s needed for regression model training.


Delta Prediction

Final model (trained on 1M samples) performance on test data:

Prediction sample:

Name RMSE MAPE

Random Forest Regression 0.0231 0.0122

actual value predicted value difference1.0000 1.0000 0.00%0.8211 0.8266 0.66%0.7001 0.6983 0.25%0.9788 0.9725 0.64%1.0000 0.9999 0.01%0.9491 0.9534 0.45%0.9999 0.9977 0.22%0.1454 0.1487 2.33%0.5084 0.5142 1.15%1.0000 1.0000 0.00%


Delta Prediction

Correlation matrix: [[1. 0.9985212] [0.9985212 1. ]]


Delta Prediction

Final model (trained on 1M samples) performance on test data:

Prediction sample:

Name RMSE MAPE

MLP 0.0188 0.0149

actual value predicted value difference1.0000 1.0000 0.00%0.8211 0.8266 0.66%0.7001 0.6983 0.25%0.9788 0.9725 0.64%1.0000 0.9999 0.01%0.9491 0.9534 0.45%0.9999 0.9977 0.22%0.1454 0.1487 2.33%0.5084 0.5142 1.15%1.0000 1.0000 0.00%


Delta Prediction

N=200,000


What Happens to the P&L of Hedging using our MLP Model

mean std 95% CI (lower)

95% CI (upper) min max

10 rebalancing stepsBlack-Scholes -0.0017 1.2181 -1.9561 2.0338 -4.0713 8.1401Machine Learning -0.0005 1.2673 -2.0973 2.0698 -4.0525 7.7224Diff -0.0012 -0.0492 0.1412 -0.036 -0.0188 0.4177

50 rebalancing stepsBlack-Scholes -0.0003 0.5618 -0.9195 0.9295 -2.6618 3.7459Machine Learning -0.0001 0.7123 -1.311 1.0704 -2.9193 3.7386Diff -0.0002 -0.1505 0.3915 -0.1409 0.2575 0.0073

90 rebalancing stepsBlack-Scholes -0.0039 0.4196 -0.6878 0.681 -2.211 3.3479Machine Learning -0.005 0.6114 -1.17 0.8822 -2.472 2.8202Diff 0.0011 -0.1918 0.4822 -0.2012 0.261 0.5277

Experimental conditions: Stock =100, X=100, Sig=.40, T=1/12 and r=.05, N=50000, rebalancing steps N=[10…100] with step=10, dt = T / N


What Happens to the P&L of Hedging


What Happens to the P&L of Hedging


Hedging Error Distribution over a Single Time Step


Hedging Error Distribution over Multiple Time Steps


Hedging Error Distributions over a Single Time Step Intersection

Histogram intersection: 97.72%


Hedging Error distributions over Multiple Time Steps Intersection

Histogram intersection: 78.14%


Practical Extensions

Relaxing assumptions– Transaction Costs– Use other ESGs– Hedging of other risks– Comparing move based versus time based rebalancing strategies– Add non-linear hedge instruments– Include Back and Stress Testing– Extend to VAs and to other products– Toy with different objective functions

Use other and different ML approaches– Reinforcement Learning– Unsupervised Learning


Conclusions

ML is a fascinating subject! I Encourage you to

– To think about how AI & ML can help your business– And to roll-up your sleeves and get started

Look at Python and R tools MatLab has ML toolboxes NVIDIA has a lot of resources and materials to get you started


Acknowledgements

Aguilar L.A. and Schoutens W. “The Effect of Transaction Costs on Delta-hedging strategies” Willmott Magazine, p48

Boyle, P.P. and Emanuel, D., “Discretely Adjusted Option Hedges”, Journal of Financial Economics 8(19809):259

Boyle, P.P. and Vorst, T. “Option Replication in Discrete Time with Transaction Costs” Journal of Finance 47(1992):271

Cox J.C. and Ross S.A. and Rubinestein M. “Option pricing: A simplified approach” Journal of Financial Economics 7 (1979) 229-263

Kamal M. and Derman E, “Correcting Black Scholes” RISK (1999) 82-85

Leland, H.E. “Option Pricing and Replication with Transaction Costs” Journal of Finance 15(1985):1283

McWalter T. A. “A Review and Implementation of Option Replication in Discrete Time with Transaction Costs, 2002,Technical Report University of Cape Town

Whalley, A. E., and Wilmott, P. “A Hedging Strategy and Option Valuation Model in the Presence of Transaction Costs, OCIAM Working Paper, Oxford University, 1992

nvidia.com

pwc.com/nextintech

Documents

Nov. 5-6, 2018 Chicago, IL...Equity-Based Insurance Guarantees Conference Nov. 5-6, 2018 Chicago, IL The Use of Artificial Intelligence and Machine Learning for Hedge Program Rebalancing