Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Equity-Based Insurance Guarantees Conference Nov. 5-6, 2018
Chicago, IL
The Use of Artificial Intelligence and Machine Learning for Hedge Program Rebalancing
Peter M. Phillips
SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer
Sponsored by
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
The Use of Artificial Intelligence and Machine Learning for Hedge Program RebalancingPeter M. Phillips
November 5, 2018 4:15-5:00 pm
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Interesting Landscape
Artificial IntelligenceEarly artificial intelligence stirs excitement
Machine LearningMachine learning begins to flourish Deep Learning
Deep learning breakthroughs drive AI boom
1950’s 1960’s 1970’s 1980’s 1990’s 2000’s 2010’s
Since an early flush of optimism in the 1950s, smaller subsets of artificial intelligence – first machine learning, then deep learning, a subset of machine learning – have created ever larger disruption.
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Three Overlapping Concepts
Artificial Intelligence – AITechniques that enable computers to mimic human intelligence using if-then rules, decision trees, logic, ML, and DL
Machine Learning – MLA subset of AI that includes the use of advanced statistical techniques to enable machines to improve at tasks with experience and includes DL
Deep Learning DLA subset of ML that permits software to train itself to perform tasks, like speech and image recognition, by exposing multilayered neural networks to vast amounts of data
Source: Geospatial World
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Practical Examples
Source: Cargroup.org
Artificial IntelligenceAI
Machine LearningML
Deep LearningDL
IBM Deep Blue Chess Program IBM Watson Alpha Go
Electronic Game Characters(Sims) Google Search Algorithm Natural Speech
Recognition
Early SPAM filters Amazon/Netflix Recommendations
Automated Driving Systems
Industrial Robotics Video Surveillance Voice Activated Assistants
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Machine Learning Applications Across Industries
Predictive maintenance or condition monitoring
Warranty reserve estimation
Propensity to buy
Demand forecasting
Process optimization
Telematics
Predictive inventory planning
Recommendation engines
Upsell and cross-channel marketing
Market segmentation and targeting
Customer ROI and lifetime value
Alerts and diagnostics from
real-time patient data
Disease identification and riskstratification
Patient triage optimization
Proactive health management
Healthcare provider sentiment analysis
Aircraft scheduling
Dynamic pricing
Social media – consumer feedback and interaction analysis
Customer complaint resolution
Traffic patterns and congestion management
Risk analytics and regulation
Custom Segmentation
Cross-selling and up-selling
Sales and marketing
campaign management
Credit worthiness evaluation
Power usage analytics
Seismic data processing
Carbon emissions and trading
Customer-specific pricing
Smart grid management
Enemy demand and supply
optimization
Manufacturing Retail Healthcare and Life Science
Travel and Hospitality Financial Services Energy, feedstock and Utilities
Source: whatsthebigdata.com
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Machine Learning Impact: Use Cases vs Industries
Machine learning has great potential across industries and use case types
Problem type Auto
mot
ive
Man
ufac
turin
g
Con
sum
er
Fina
nce
Agric
ultu
re
Ener
gy
Hea
lthca
re
Phar
ma-
ceut
ical
s
Publ
ic/ s
ocia
l
Med
ia
Tele
com
Tran
spor
t and
lo
gist
ics
Real-time optimization
Strategic optimization
Predictive analytics
Predictive maintenance
Radical personalization
Discovering new trends/anomalies
Forecasting
Processing unstructured data
Low HighImpact potential
Source: McKinsey Global Institute analysis
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
The Master Algorithm by Pedro Domingos
Symbolists Bayesians Connectionists Evolutionaries Analogizers
Idea Rules, logic, and symbols torepresent knowledge and to draw inferences
Focuses on assessing the likelihood of occurrence using inference
Generalize and recognize patterns dynamically with probabilistic weighted neurons
Generate variations and thenasses the fitness of each against a specific purpose
Optimize a function given set of constraints
Sample Algorithm
Rules and DecisionTrees and inverse deduction and production rule systems
Naïve Bayesor Markov Models, probabilistic inference
NeuralNetworks, backpropagationand DL
Genetic algorithms and evolutionary programming
Support Vector and Kernel Machines
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Decision Trees (1)
Description Use Cases Advantages Reference
Hierarchy of decision nodes will classify something based on series of questions
Credit risk assessment, prediction of horse races, etc
Useful when evaluating distinct features, qualities, characteristics of people, places, or things
D. T Larose, Data Mining and PredictiveAnalytics, 2nd edition, John Wiley and Sons, 2015
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Support Vector Machines – SVMs (2)
Description Use Cases Advantages Reference
Support vector machines classify groups of data using hyperplanes to determine boundaries
Categorization,handwriting recognition
Very good for binary classification and are useful whether the relationships between variables are linear or not
Matthew Kelly, Computer Science: Source, 2010
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Regression (3)
Description Use Cases Advantages Reference
Maps the behavior of a dependent variable relative to one or more independent variables, and many different approaches
Traffic flow analysis, email filtering
Useful for identifying continuousrelationships between variables
Giuseppe Bonaccorso, Machine Learning Algorithms, Packt Publishing, 2017
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Naïve Bayes Classification (4)
Description Use Cases Advantages Reference
Computes probabilities, given branches of possible outcomes to compute the combined conditional probabilities of multiple attributes where each feature is independent
Consumersegmentation, sentiment analysis, spam filtering, document classification
Quick classification of relevant items in small data sets that have distinct features
Rod Pierce, et al., MathIsFun, 2014
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Hidden Markov Models (5)
Description Use Cases Advantages Reference
Computes probabilities of hidden states occurring by analyzing observable data, and then estimating the likely pattern of future observations with the help of hidden state analysis
Facial expression analysis, weather prediction, speech recognition, malware detection
Tolerates data variability and is good at recognition and prediction
Leonardo Guizzetti, 2012
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Random Forest (6)
Description Use Cases Advantages Reference
Improves the accuracy of decision trees by using multiple trees with randomly selected subsets of data
Customer churn, risk assessment, cancer relapse risk
Prove themselves useful with large data sets and items that have numerous and sometimes irrelevant features
Nicolas Spies, WashingtonUniversity, 2015
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Recurrent Neural Networks – RNNs (7)
Description Use Cases Advantages Reference
Each neuron converts many inputs into a single output via one or more hidden layers. RNNs additionally pass values from step to step, creating a form of memory, allowing previous outputs to affect subsequent inputs
Image classification,captioning, political sentiment analysis
RNNs have predictive power when used with large amounts of sequenced information
Joseph Wilks, 2012
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural networks (8)
Description Use Cases Advantages Reference
Allow for both long-term and short-term memory, thereby avoiding gradient decay of values passed from step to step
Natural languageprocessing and translation
LTSMs and GRUs have same advantages as other RNNs and are more frequently used than RNNs because of their memory properties
G Orr, et al., Williamette University, 1999
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Convolutional Neural Networks – CNNs (9)
Description Use Cases Advantages Reference
Blends of weights from subsequent layer that are used to label the output layer
Image recognition, text to speech, drug discovery
Most useful with large data sets, with large number of features, and complex classification requirements
Algobeans, 2016
Source: pwc.com/NextinTech
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
But it Gets Complicated…
Source: scientiscafe.com/2017/07/08/machinelearinggal
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
ML Laboratory: Delta Hedging
The Common Assumptions:– No arbitrage pricing– Risk-less hedging– Risk-neutral option valuation– Continuous-time limit—where risk disappears– No transaction costs– Lognormal stock prices, with interest rates and volatility fixed– PDEs
– Produces the Celebrated Black-Scholes Equation—simple and beautiful
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
ML Laboratory: Delta Hedging
The discrete time set up Log normal stock prices
Taylor series expansion for a call option in terms of changes in stock and time
Substituting discrete model of stock prices into the Taylor series expansion yields
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
ML Laboratory: Delta Hedging
Create a portfolio 𝜋𝜋 Consisting "of a call option held long a short position of delta shares in the underlying" over small time step
Change over small time step
Expression for hedging error (which includes initial investment required to set up the portfolio)
Substituting change over small step into the HE expression leaves
Simplify
HE is proportional to Gamma, and to the time step, and to the square of stock price, and to the volatility of the stock price
(#1)
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Delta Hedging Error Over One Time Step
Experimental conditions: Stock =100, X=100, u=.05, Sig=.40, T=1/12 and r=.05 and N=50000 and time step=1 day
We are comparing simulated results with an analytical approximation from equation #1
Note the squared normal variable, which means it is chi-squared distribution with one degree of freedom, over one time step according to equation #1
P&L is highly skewed
There is small probability of making a lot of money and large probability of losing a small amount of money, because the option position is held long
Option writers face the flipside risk, with large probability of making a small profit and a small probability of having a large loss
#1
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Delta Hedging Final P&L Equations
Now you can extend equation #1 to n time steps where the hedge is rebalanced, as the sum of hedging errors at each update
With more algebra (Kamal and Derman) we arrive at equation #2
With Gaussian integration we arrive at equation #3
Final P&L is directly proportional to the option’s vega and volatility and inversely proportional to the square root of the number of rebalances
(#2)
(#3)
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Standard Deviation of Delta Hedging of the Final P&L
Experimental conditions: Stock =100, X=100, u=.05, Sig=.40, T=1/12 and r=.05 and N=50000 and dt=1 day
We are comparing simulated results with analytical approximations from equation #2 and #3
We are looking at the standard deviation of the final cumulative P&L
Note approximation #2 underestimates the standard deviation
Note approximation #3 overestimates at strikes close to the initial stock price and underestimates at strikes above the initial stock price
#2#3
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Validation of ML Methods for Delta Prediction
Initial list of candidate models:
Non-parametric methods: K-Nearest Neighbors (kNN) Radius K-Nearest Neighbors
Generalized Linear Models: Linear Regression Linear least squares with L2 Lasso Regression Least Angle Regression Lasso with Least Angle
Regression Linear regression with combined
L1 and L2 regularization Bayesian ridge regression
Kernel methods: Epsilon-Support Vector Regression Nu Support Vector Regression Linear Support Vector Regression
Tree-based models: Decision Tree Regression Random Forest Regression
Non-linear models (Neural-Networks): Multi-layer Perceptron (MLP)
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
ML Model Training and Evaluation Process
This is not a regular case of ML model validation and training. We’ve generated synthetic dataset and computed BS delta as a target variable for each generated sample.On a fairly large dataset size (50K samples) we’ve performed validation of a couple of machine learning models.
ML model evaluation an training steps:1. Selected best performing models based on RMSE and MAPE metrics, using
a special case of validation dataset.2. Random Forest Regression were used for representative dataset size
selection (see Data validation description).3. Obtained a minimum representative dataset size, N = 1000000.4. Trained final models Random Forest Regression and MLP on 1M dataset.5. Tested on a special-case validation dataset (see Cross validation
description and results)
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Validation of ML Methods for Delta Prediction
The best performers based on cross-validation: K-Nearest Neighbors Random Forest Regression Multi-layer Perceptron (Neural-Network)
Name RMSE (std) MAPE (std)
K-Nearest Neighbors 0.0305 (0.0020) 0.0162 (0.0001)
Random Forest Regression 0.0214 (0.0010) 0.0109 (0.0001)
Multi-layer Perceptron (Neural-Network) 0.0307 (0.0020) 0.0223 (0.0030)
Results of validation (50K dataset):
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Pros and Cons of Each Model
Estimator Pros Cons
K-Nearest Neighbors No training required, simple algorithm, interpretable results
Computationally expensive; Slow on inference
Random Forest Regression High accuracy, interpretable resultsPoor interpolation/extrapolation of unseen data; Might be slow on inference; High artifacts size
Multi-layer Perceptron (Neural-Network)
Fast forward pass calculation (inference); Huge learning capacity
“Data-hungry”; Difficult hyper-parameter optimization; Expensive training of deep topologies
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Synthetic Dataset Generation Step
We are using the following variables ranges for data simulation:
1. The Price and Delta of the European Call Option in the Black-Scholes world depends on: Current price of the underlying share (Index price) Strike price Volatility of the underlying share Risk free interest rate Time-to-maturity of the option
2. To build the ML model, we should generate a representative sample dataset which correlates these parameters with the call option Delta calculated using Black-Scholes equation.
3. It is required that this dataset should cover all possible combinations of the parameters and should not contain gaps or clusters to prevent the large biases of the ML predicted results.
4. The straightforward way is to use a fine 5-dimensional uniform grid, but it needs a lot of points to be evaluated and this approach is not effective.
5. The most effective way is to use quasi-random low discrepancy sequences like Sobol which performs better because it is more sparse. Therefore, we have generated 5-dimensional Sobol sequence and mapped it to the hyper rectangular volume described in the table.
Index price [0.0, 200.0]
Strike price [0.0, 200.0]
Volatility [0.0, 1.0]
Risk free interest rate [0.0, 1.0]
Time to maturity [0.0, 1.0]
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Estimation of the Representative Dataset Size
N – size of the sample datasetK – number of datasets per iteration
N1: [10000, 25000, 50000, 250000, 1M] – list of N’s to be testedK = 10
Use Random Forest Regression to select representative dataset size.1. Starting with list of N’s (N1) and K=10.2. Generate K different datasets of size N3. Train the model on first dataset4. Validate on the rest K-1 datasets (9 remaining datasets), each of size N.5. Choose next value of N from the N1 list, repeat 2-46. Check prediction error convergence.7. If the error converged – assume we have reached model capacity (the model is not able to learn more information).
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Estimation of the Representative Dataset Size
Baseline model capacity (Random Forest, n_estimators = 100 )
We can observe that standard-deviation of the model prediction error on K-1 datasets is decreasing with the increase of the dataset size (N). The mean error also decreases, as expected – the model is learning.
We want standard-deviation to converge with increase of dataset size N – to describe the population variance from the sample dataset. But, on the other hand we might reach the estimator learning capacity limit earlier.
That’s why…
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Estimation of the Representative Dataset Size
N – size of the sample datasetK – number of datasets per iteration
N2: [250000, 1M, 1.5M, 2M] – list of N’s to be testedK = 10
…we perform the next steps:8. Increase model capacity - increase number of estimators for Random Forest
Regression from 100 to 150 (this is similar to increasing number and/or size of hidden layers of a Neural-Network).
9. Use bigger dataset sizes – N2 list.10.Repeat steps 2-4 until convergence.
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Estimation of the Representative Dataset Size
Increased model capacity by 50% (Random Forest, n_estimators = 150)
As we can observe from the plots below – the standard deviation of the model error converged near dataset size of 1M samples.
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Model Validation
We used different random number generators for Data validation and Model validation. For training - quasi-random Sobol algorithm.For validation - Mersenne Twister pseudorandom number generator with random seed for each validation fold.
The aim of using different random number generators for training an validation sets is to test the model on unseen data. Similar to training and test sets in case regular ML model training process.
Sobol is a low discrepancy sequence (LDS) sampling strategy, based on the deterministic placement of sample points as uniformly as possible, avoiding large gaps or clusters. Which is exactly what’s needed for regression model training.
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Delta Prediction
Final model (trained on 1M samples) performance on test data:
Prediction sample:
Name RMSE MAPE
Random Forest Regression 0.0231 0.0122
actual value predicted value difference1.0000 1.0000 0.00%0.8211 0.8266 0.66%0.7001 0.6983 0.25%0.9788 0.9725 0.64%1.0000 0.9999 0.01%0.9491 0.9534 0.45%0.9999 0.9977 0.22%0.1454 0.1487 2.33%0.5084 0.5142 1.15%1.0000 1.0000 0.00%
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Delta Prediction
Correlation matrix: [[1. 0.9985212] [0.9985212 1. ]]
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Delta Prediction
Final model (trained on 1M samples) performance on test data:
Prediction sample:
Name RMSE MAPE
MLP 0.0188 0.0149
actual value predicted value difference1.0000 1.0000 0.00%0.8211 0.8266 0.66%0.7001 0.6983 0.25%0.9788 0.9725 0.64%1.0000 0.9999 0.01%0.9491 0.9534 0.45%0.9999 0.9977 0.22%0.1454 0.1487 2.33%0.5084 0.5142 1.15%1.0000 1.0000 0.00%
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Delta Prediction
N=200,000
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
What Happens to the P&L of Hedging using our MLP Model
mean std 95% CI (lower)
95% CI (upper) min max
10 rebalancing stepsBlack-Scholes -0.0017 1.2181 -1.9561 2.0338 -4.0713 8.1401Machine Learning -0.0005 1.2673 -2.0973 2.0698 -4.0525 7.7224Diff -0.0012 -0.0492 0.1412 -0.036 -0.0188 0.4177
50 rebalancing stepsBlack-Scholes -0.0003 0.5618 -0.9195 0.9295 -2.6618 3.7459Machine Learning -0.0001 0.7123 -1.311 1.0704 -2.9193 3.7386Diff -0.0002 -0.1505 0.3915 -0.1409 0.2575 0.0073
90 rebalancing stepsBlack-Scholes -0.0039 0.4196 -0.6878 0.681 -2.211 3.3479Machine Learning -0.005 0.6114 -1.17 0.8822 -2.472 2.8202Diff 0.0011 -0.1918 0.4822 -0.2012 0.261 0.5277
Experimental conditions: Stock =100, X=100, Sig=.40, T=1/12 and r=.05, N=50000, rebalancing steps N=[10…100] with step=10, dt = T / N
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
What Happens to the P&L of Hedging
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
What Happens to the P&L of Hedging
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Hedging Error Distribution over a Single Time Step
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Hedging Error Distribution over Multiple Time Steps
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Hedging Error Distributions over a Single Time Step Intersection
Histogram intersection: 97.72%
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Hedging Error distributions over Multiple Time Steps Intersection
Histogram intersection: 78.14%
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Practical Extensions
Relaxing assumptions– Transaction Costs– Use other ESGs– Hedging of other risks– Comparing move based versus time based rebalancing strategies– Add non-linear hedge instruments– Include Back and Stress Testing– Extend to VAs and to other products– Toy with different objective functions
Use other and different ML approaches– Reinforcement Learning– Unsupervised Learning
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Conclusions
ML is a fascinating subject! I Encourage you to
– To think about how AI & ML can help your business– And to roll-up your sleeves and get started
Look at Python and R tools MatLab has ML toolboxes NVIDIA has a lot of resources and materials to get you started
Prepared by PathWiseTM Solutions GroupProprietary & Confidential
Acknowledgements
Aguilar L.A. and Schoutens W. “The Effect of Transaction Costs on Delta-hedging strategies” Willmott Magazine, p48
Boyle, P.P. and Emanuel, D., “Discretely Adjusted Option Hedges”, Journal of Financial Economics 8(19809):259
Boyle, P.P. and Vorst, T. “Option Replication in Discrete Time with Transaction Costs” Journal of Finance 47(1992):271
Cox J.C. and Ross S.A. and Rubinestein M. “Option pricing: A simplified approach” Journal of Financial Economics 7 (1979) 229-263
Kamal M. and Derman E, “Correcting Black Scholes” RISK (1999) 82-85
Leland, H.E. “Option Pricing and Replication with Transaction Costs” Journal of Finance 15(1985):1283
McWalter T. A. “A Review and Implementation of Option Replication in Discrete Time with Transaction Costs, 2002,Technical Report University of Cape Town
Whalley, A. E., and Wilmott, P. “A Hedging Strategy and Option Valuation Model in the Presence of Transaction Costs, OCIAM Working Paper, Oxford University, 1992
nvidia.com
pwc.com/nextintech