107
UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS SCHOOL 2019-2020 MSc DISSERTATION COVER SHEET I confirm that I have read and understood the regulations on plagiarism* and have acknowledged the work of others that I have included in this dissertation. Please read the following statement and tick ONE box regarding permission, or denial thereof, to view your dissertation by other students: - I AGREE to allow my dissertation to be seen by future students. By signing this form, I agree to allow access to students of the Business School, as part of the University of Birmingham, to view my dissertation, or part thereof, for guidance as an example of good practice. For its part, the University will grant access to Birmingham Business School students as it deems appropriate, but in so doing forbids anyone to copy or use my dissertation in any other way or for any other purpose. I understand that my dissertation will be available to view via Canvas and that any personal references will be anonymised. I further understand that the University has no control over the actions of third parties, and should I have any concerns, my permission may be withdrawn, at any time, by advising the Business School in writing. - I DO NOT AGREE to allow my dissertation to be seen by future students. Print Name: Chun Kiat ONG Student ID: 2031611 Date: 9 th September 2020 -------------------------------------------------------------------------------------------------------------- *Plagiarism, in this context, is the reproduction of material from books and articles without acknowledgement. It is the act of passing off another person’s work as your own, copying a fellow student’s work or reproducing work submitted by a past student. Such actions are seen as a form of cheating and, as such, are penalised by examiners according to their extent and gravity. You should not quote existing work without quotation marks and appropriate referencing. An attempt to present the work of someone else as your own may lead to your dissertation being awarded a mark of zero. You are required to state the full references of all sources that you use. If quotations are made, they must be explicitly and fully referenced, including stating the relevant page number(s). You will be penalised very severely if examiners find that you have presented a section of a book, an article or a paper without appropriate referencing. If you are not sure about how to quote an existing work, please ask for advice from your supervisor.

UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

UNIVERSITY OF BIRMINGHAM

BIRMINGHAM BUSINESS SCHOOL

2019-2020

MSc DISSERTATION COVER SHEET

I confirm that I have read and understood the regulations on plagiarism and have acknowledged the work of others that I have included in this dissertation Please read the following statement and tick ONE box regarding permission or denial thereof to view your dissertation by other students

- I AGREE to allow my dissertation to be seen by future students

By signing this form I agree to allow access to students of the Business School as part of the University of Birmingham to view my dissertation or part thereof for guidance as an example of good practice For its part the University will grant access to Birmingham Business School students as it deems appropriate but in so doing forbids anyone to copy or use my dissertation in any other way or for any other purpose I understand that my dissertation will be available to view via Canvas and that any personal references will be anonymised I further understand that the University has no control over the actions of third parties and should I have any concerns my permission may be withdrawn at any time by advising the Business School in writing

- I DO NOT AGREE to allow my dissertation to be seen by future students

Print Name Chun Kiat ONG Student ID 2031611 Date 9th September 2020 -------------------------------------------------------------------------------------------------------------- Plagiarism in this context is the reproduction of material from books and articles without acknowledgement It is the act of passing off another personrsquos work as your own copying a fellow studentrsquos work or reproducing work submitted by a past student Such actions are seen as a form of cheating and as such are penalised by examiners according to their extent and gravity You should not quote existing work without quotation marks and appropriate referencing An attempt to present the work of someone else as your own may lead to your dissertation being awarded a mark of zero You are required to state the full references of all sources that you use If quotations are made they must be explicitly and fully referenced including stating the relevant page number(s) You will be penalised very severely if examiners find that you have presented a section of a book an article or a paper without appropriate referencing If you are not sure about how to quote an existing work please ask for advice from your supervisor

UNIVERSITY OF BIRMINGHAM

MASTER OF SCIENCE THESIS

The Performance of ArtificialNeural Networks on Rough Heston

Model

AuthorChun Kiat ONG

(ID2031611)

SupervisorDr Daniel JDUFFY

A thesis submitted in partial fulfillment of the requirementsfor the degree of MSc Mathematical Finance

in theBirmingham Business School

9th September 2020

iii

Declaration of AuthorshipI Chun Kiat ONG (ID2031611) declare that this thesis titled ldquoThe Perfor-mance of Artificial Neural Networks on Rough Heston Modelrdquo and the workpresented in it are my own I confirm that

bull This work was done wholly or mainly while in candidature for a masterdegree at this University

bull Where any part of this thesis has previously been submitted for a de-gree or any other qualification at this University or any other institu-tion this has been clearly stated

bull Where I have consulted the published work of others this is alwaysclearly attributed

bull Where I have quoted from the work of others the source is alwaysgiven With the exception of such quotations this thesis is entirely myown work

bull I have acknowledged all main sources of help

bull Where the thesis is based on work done by myself jointly with othersI have made clear exactly what was done by others and what I havecontributed myself

Signed

Date

Owner
Typewritten Text
9th September 2020
Owner
Typewritten Text
CHUN KIAT ONG

v

UNIVERSITY OF BIRMINGHAM

AbstractBirmingham Business School

MSc Mathematical Finance

The Performance of Artificial Neural Networks on Rough Heston Model

by Chun Kiat ONG (ID2031611)

There has been extensive research on finding various methods to price op-tion more accurately and more quickly The rise of artificial neural networkscould provide an alternative means for this application Although there ex-ist many methods for approximating option prices under the rough Hestonmodel our objective is to investigate the performance of artificial neural net-works on rough Heston model option pricing as well as implied volatilityapproximations since it is part of the calibration process We use simulateddata to train the neural network instead of real market data for regulatoryreason We also adapt the image-based implicit training method for ANNimplied volatility approximations where the output of the ANN is the entireimplied volatility surface The results shows that artificial neural networkscan indeed generate accurate option prices and implied volatility approxi-mations but it is not as efficient as the existing methods Hence we concludethat ANN is a feasible alternative method for pricing option under the roughHeston model but an ineffective one

vii

AcknowledgementsI would like to take a moment to thank those who helped me throughout thewriting of this dissertation

First I wish to thank my supervisor Dr Daniel Duffy for his invaluablesupport and expertise guidance I would also like to thank Dr Colin Rowatfor designing such a challenging and rigorous course I truly enjoyed it

I would also like to take this opportunity to acknowledge the support andgreat love of my family my girlfriend and my friends They kept me goingon and this work would not have been possible without them

ix

Contents

Declaration of Authorship iii

Abstract v

Acknowledgements vii

0 Projectrsquos Overview 101 Projectrsquos Overview 1

1 Introduction 511 Organisation of This Thesis 6

2 Literature Review 721 Application of ANN in Finance 722 Option Pricing Models 823 The Research Gap 8

3 Option Pricing Models 1131 The Heston Model 11

311 Option Pricing Under Heston Model 12312 Heston Model Implied Volatility Approximation 14

32 Volatility is Rough 1733 The Rough Heston Model 18

331 Rough Heston Pricing Equation 193311 Rational Approximation of Rough Heston Ric-

cati Solution 203312 Option Pricing Using Fast Fourier Transform 22

332 Rough Heston Model Implied Volatility 23

4 Artificial Neural Networks 2541 Dissecting the Artificial Neural Networks 25

411 Neurons 26412 Input Output and Hidden Layers 26413 Connections Weights and Biases 26414 Forward and Backward Propagation Loss Functions 27415 Activation Functions 27416 Optimisation Algorithms and Learning Rate 28417 Epochs Early Stopping and Batch Size 29

42 Feedforward Neural Networks 30421 Deep Neural Networks 30

x

43 Common Pitfalls and Remedies 30431 Loss Function Stagnation 31432 Loss Function Fluctuation 31433 Over-fitting 31434 Under-fitting 32

44 Application in Finance ANN Image-based Implicit Methodfor Implied Volatility Approximation 32

5 Data 3551 Data Generation and Storage 35

511 Heston Call Option Data 36512 Heston Implied Volatility Data 37513 Rough Heston Call Option Data 37514 Rough Heston Implied Volatility Data 38

52 Data Pre-processing 39521 Data Splitting Train - Validate - Test 39522 Data Standardisation and Normalisation 39523 Data Partitioning 40

6 Neural Networks Architecture and Experimental Setup 4161 Neural Networks Architecture 41

611 ANN Architecture Option Pricing under Heston Model 43612 ANN Architecture Implied Volatility Surface of Hes-

ton Model 44613 ANN Architecture Option Pricing under Rough Hes-

ton Model 45614 ANN Architecture Implied Volatility Surface of Rough

Heston Model 4562 Performance Metrics and Validation Methods 4563 Implementation of ANN in Keras 4764 Summary of Neural Networks Architecture 48

7 Results and Discussions 4971 Heston Option Pricing ANN Results 4972 Heston Implied Volatility ANN Results 5173 Rough Heston Option Pricing ANN Results 5374 Rough Heston Implied Volatility ANN Results 5575 Run-time Performance 59

8 Conclusions and Outlook 61

A pybind11 A step-by-step tutorial with examples 63A1 Software Prerequisites 63A2 Create a Python Project 64A3 Create a C++ Project 66A4 Convert the C++ project to extension for Python 67A5 Make the DLL available to Python 68A6 Call the DLL from Python 70

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 2: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

UNIVERSITY OF BIRMINGHAM

MASTER OF SCIENCE THESIS

The Performance of ArtificialNeural Networks on Rough Heston

Model

AuthorChun Kiat ONG

(ID2031611)

SupervisorDr Daniel JDUFFY

A thesis submitted in partial fulfillment of the requirementsfor the degree of MSc Mathematical Finance

in theBirmingham Business School

9th September 2020

iii

Declaration of AuthorshipI Chun Kiat ONG (ID2031611) declare that this thesis titled ldquoThe Perfor-mance of Artificial Neural Networks on Rough Heston Modelrdquo and the workpresented in it are my own I confirm that

bull This work was done wholly or mainly while in candidature for a masterdegree at this University

bull Where any part of this thesis has previously been submitted for a de-gree or any other qualification at this University or any other institu-tion this has been clearly stated

bull Where I have consulted the published work of others this is alwaysclearly attributed

bull Where I have quoted from the work of others the source is alwaysgiven With the exception of such quotations this thesis is entirely myown work

bull I have acknowledged all main sources of help

bull Where the thesis is based on work done by myself jointly with othersI have made clear exactly what was done by others and what I havecontributed myself

Signed

Date

Owner
Typewritten Text
9th September 2020
Owner
Typewritten Text
CHUN KIAT ONG

v

UNIVERSITY OF BIRMINGHAM

AbstractBirmingham Business School

MSc Mathematical Finance

The Performance of Artificial Neural Networks on Rough Heston Model

by Chun Kiat ONG (ID2031611)

There has been extensive research on finding various methods to price op-tion more accurately and more quickly The rise of artificial neural networkscould provide an alternative means for this application Although there ex-ist many methods for approximating option prices under the rough Hestonmodel our objective is to investigate the performance of artificial neural net-works on rough Heston model option pricing as well as implied volatilityapproximations since it is part of the calibration process We use simulateddata to train the neural network instead of real market data for regulatoryreason We also adapt the image-based implicit training method for ANNimplied volatility approximations where the output of the ANN is the entireimplied volatility surface The results shows that artificial neural networkscan indeed generate accurate option prices and implied volatility approxi-mations but it is not as efficient as the existing methods Hence we concludethat ANN is a feasible alternative method for pricing option under the roughHeston model but an ineffective one

vii

AcknowledgementsI would like to take a moment to thank those who helped me throughout thewriting of this dissertation

First I wish to thank my supervisor Dr Daniel Duffy for his invaluablesupport and expertise guidance I would also like to thank Dr Colin Rowatfor designing such a challenging and rigorous course I truly enjoyed it

I would also like to take this opportunity to acknowledge the support andgreat love of my family my girlfriend and my friends They kept me goingon and this work would not have been possible without them

ix

Contents

Declaration of Authorship iii

Abstract v

Acknowledgements vii

0 Projectrsquos Overview 101 Projectrsquos Overview 1

1 Introduction 511 Organisation of This Thesis 6

2 Literature Review 721 Application of ANN in Finance 722 Option Pricing Models 823 The Research Gap 8

3 Option Pricing Models 1131 The Heston Model 11

311 Option Pricing Under Heston Model 12312 Heston Model Implied Volatility Approximation 14

32 Volatility is Rough 1733 The Rough Heston Model 18

331 Rough Heston Pricing Equation 193311 Rational Approximation of Rough Heston Ric-

cati Solution 203312 Option Pricing Using Fast Fourier Transform 22

332 Rough Heston Model Implied Volatility 23

4 Artificial Neural Networks 2541 Dissecting the Artificial Neural Networks 25

411 Neurons 26412 Input Output and Hidden Layers 26413 Connections Weights and Biases 26414 Forward and Backward Propagation Loss Functions 27415 Activation Functions 27416 Optimisation Algorithms and Learning Rate 28417 Epochs Early Stopping and Batch Size 29

42 Feedforward Neural Networks 30421 Deep Neural Networks 30

x

43 Common Pitfalls and Remedies 30431 Loss Function Stagnation 31432 Loss Function Fluctuation 31433 Over-fitting 31434 Under-fitting 32

44 Application in Finance ANN Image-based Implicit Methodfor Implied Volatility Approximation 32

5 Data 3551 Data Generation and Storage 35

511 Heston Call Option Data 36512 Heston Implied Volatility Data 37513 Rough Heston Call Option Data 37514 Rough Heston Implied Volatility Data 38

52 Data Pre-processing 39521 Data Splitting Train - Validate - Test 39522 Data Standardisation and Normalisation 39523 Data Partitioning 40

6 Neural Networks Architecture and Experimental Setup 4161 Neural Networks Architecture 41

611 ANN Architecture Option Pricing under Heston Model 43612 ANN Architecture Implied Volatility Surface of Hes-

ton Model 44613 ANN Architecture Option Pricing under Rough Hes-

ton Model 45614 ANN Architecture Implied Volatility Surface of Rough

Heston Model 4562 Performance Metrics and Validation Methods 4563 Implementation of ANN in Keras 4764 Summary of Neural Networks Architecture 48

7 Results and Discussions 4971 Heston Option Pricing ANN Results 4972 Heston Implied Volatility ANN Results 5173 Rough Heston Option Pricing ANN Results 5374 Rough Heston Implied Volatility ANN Results 5575 Run-time Performance 59

8 Conclusions and Outlook 61

A pybind11 A step-by-step tutorial with examples 63A1 Software Prerequisites 63A2 Create a Python Project 64A3 Create a C++ Project 66A4 Convert the C++ project to extension for Python 67A5 Make the DLL available to Python 68A6 Call the DLL from Python 70

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 3: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

iii

Declaration of AuthorshipI Chun Kiat ONG (ID2031611) declare that this thesis titled ldquoThe Perfor-mance of Artificial Neural Networks on Rough Heston Modelrdquo and the workpresented in it are my own I confirm that

bull This work was done wholly or mainly while in candidature for a masterdegree at this University

bull Where any part of this thesis has previously been submitted for a de-gree or any other qualification at this University or any other institu-tion this has been clearly stated

bull Where I have consulted the published work of others this is alwaysclearly attributed

bull Where I have quoted from the work of others the source is alwaysgiven With the exception of such quotations this thesis is entirely myown work

bull I have acknowledged all main sources of help

bull Where the thesis is based on work done by myself jointly with othersI have made clear exactly what was done by others and what I havecontributed myself

Signed

Date

Owner
Typewritten Text
9th September 2020
Owner
Typewritten Text
CHUN KIAT ONG

v

UNIVERSITY OF BIRMINGHAM

AbstractBirmingham Business School

MSc Mathematical Finance

The Performance of Artificial Neural Networks on Rough Heston Model

by Chun Kiat ONG (ID2031611)

There has been extensive research on finding various methods to price op-tion more accurately and more quickly The rise of artificial neural networkscould provide an alternative means for this application Although there ex-ist many methods for approximating option prices under the rough Hestonmodel our objective is to investigate the performance of artificial neural net-works on rough Heston model option pricing as well as implied volatilityapproximations since it is part of the calibration process We use simulateddata to train the neural network instead of real market data for regulatoryreason We also adapt the image-based implicit training method for ANNimplied volatility approximations where the output of the ANN is the entireimplied volatility surface The results shows that artificial neural networkscan indeed generate accurate option prices and implied volatility approxi-mations but it is not as efficient as the existing methods Hence we concludethat ANN is a feasible alternative method for pricing option under the roughHeston model but an ineffective one

vii

AcknowledgementsI would like to take a moment to thank those who helped me throughout thewriting of this dissertation

First I wish to thank my supervisor Dr Daniel Duffy for his invaluablesupport and expertise guidance I would also like to thank Dr Colin Rowatfor designing such a challenging and rigorous course I truly enjoyed it

I would also like to take this opportunity to acknowledge the support andgreat love of my family my girlfriend and my friends They kept me goingon and this work would not have been possible without them

ix

Contents

Declaration of Authorship iii

Abstract v

Acknowledgements vii

0 Projectrsquos Overview 101 Projectrsquos Overview 1

1 Introduction 511 Organisation of This Thesis 6

2 Literature Review 721 Application of ANN in Finance 722 Option Pricing Models 823 The Research Gap 8

3 Option Pricing Models 1131 The Heston Model 11

311 Option Pricing Under Heston Model 12312 Heston Model Implied Volatility Approximation 14

32 Volatility is Rough 1733 The Rough Heston Model 18

331 Rough Heston Pricing Equation 193311 Rational Approximation of Rough Heston Ric-

cati Solution 203312 Option Pricing Using Fast Fourier Transform 22

332 Rough Heston Model Implied Volatility 23

4 Artificial Neural Networks 2541 Dissecting the Artificial Neural Networks 25

411 Neurons 26412 Input Output and Hidden Layers 26413 Connections Weights and Biases 26414 Forward and Backward Propagation Loss Functions 27415 Activation Functions 27416 Optimisation Algorithms and Learning Rate 28417 Epochs Early Stopping and Batch Size 29

42 Feedforward Neural Networks 30421 Deep Neural Networks 30

x

43 Common Pitfalls and Remedies 30431 Loss Function Stagnation 31432 Loss Function Fluctuation 31433 Over-fitting 31434 Under-fitting 32

44 Application in Finance ANN Image-based Implicit Methodfor Implied Volatility Approximation 32

5 Data 3551 Data Generation and Storage 35

511 Heston Call Option Data 36512 Heston Implied Volatility Data 37513 Rough Heston Call Option Data 37514 Rough Heston Implied Volatility Data 38

52 Data Pre-processing 39521 Data Splitting Train - Validate - Test 39522 Data Standardisation and Normalisation 39523 Data Partitioning 40

6 Neural Networks Architecture and Experimental Setup 4161 Neural Networks Architecture 41

611 ANN Architecture Option Pricing under Heston Model 43612 ANN Architecture Implied Volatility Surface of Hes-

ton Model 44613 ANN Architecture Option Pricing under Rough Hes-

ton Model 45614 ANN Architecture Implied Volatility Surface of Rough

Heston Model 4562 Performance Metrics and Validation Methods 4563 Implementation of ANN in Keras 4764 Summary of Neural Networks Architecture 48

7 Results and Discussions 4971 Heston Option Pricing ANN Results 4972 Heston Implied Volatility ANN Results 5173 Rough Heston Option Pricing ANN Results 5374 Rough Heston Implied Volatility ANN Results 5575 Run-time Performance 59

8 Conclusions and Outlook 61

A pybind11 A step-by-step tutorial with examples 63A1 Software Prerequisites 63A2 Create a Python Project 64A3 Create a C++ Project 66A4 Convert the C++ project to extension for Python 67A5 Make the DLL available to Python 68A6 Call the DLL from Python 70

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 4: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

v

UNIVERSITY OF BIRMINGHAM

AbstractBirmingham Business School

MSc Mathematical Finance

The Performance of Artificial Neural Networks on Rough Heston Model

by Chun Kiat ONG (ID2031611)

There has been extensive research on finding various methods to price op-tion more accurately and more quickly The rise of artificial neural networkscould provide an alternative means for this application Although there ex-ist many methods for approximating option prices under the rough Hestonmodel our objective is to investigate the performance of artificial neural net-works on rough Heston model option pricing as well as implied volatilityapproximations since it is part of the calibration process We use simulateddata to train the neural network instead of real market data for regulatoryreason We also adapt the image-based implicit training method for ANNimplied volatility approximations where the output of the ANN is the entireimplied volatility surface The results shows that artificial neural networkscan indeed generate accurate option prices and implied volatility approxi-mations but it is not as efficient as the existing methods Hence we concludethat ANN is a feasible alternative method for pricing option under the roughHeston model but an ineffective one

vii

AcknowledgementsI would like to take a moment to thank those who helped me throughout thewriting of this dissertation

First I wish to thank my supervisor Dr Daniel Duffy for his invaluablesupport and expertise guidance I would also like to thank Dr Colin Rowatfor designing such a challenging and rigorous course I truly enjoyed it

I would also like to take this opportunity to acknowledge the support andgreat love of my family my girlfriend and my friends They kept me goingon and this work would not have been possible without them

ix

Contents

Declaration of Authorship iii

Abstract v

Acknowledgements vii

0 Projectrsquos Overview 101 Projectrsquos Overview 1

1 Introduction 511 Organisation of This Thesis 6

2 Literature Review 721 Application of ANN in Finance 722 Option Pricing Models 823 The Research Gap 8

3 Option Pricing Models 1131 The Heston Model 11

311 Option Pricing Under Heston Model 12312 Heston Model Implied Volatility Approximation 14

32 Volatility is Rough 1733 The Rough Heston Model 18

331 Rough Heston Pricing Equation 193311 Rational Approximation of Rough Heston Ric-

cati Solution 203312 Option Pricing Using Fast Fourier Transform 22

332 Rough Heston Model Implied Volatility 23

4 Artificial Neural Networks 2541 Dissecting the Artificial Neural Networks 25

411 Neurons 26412 Input Output and Hidden Layers 26413 Connections Weights and Biases 26414 Forward and Backward Propagation Loss Functions 27415 Activation Functions 27416 Optimisation Algorithms and Learning Rate 28417 Epochs Early Stopping and Batch Size 29

42 Feedforward Neural Networks 30421 Deep Neural Networks 30

x

43 Common Pitfalls and Remedies 30431 Loss Function Stagnation 31432 Loss Function Fluctuation 31433 Over-fitting 31434 Under-fitting 32

44 Application in Finance ANN Image-based Implicit Methodfor Implied Volatility Approximation 32

5 Data 3551 Data Generation and Storage 35

511 Heston Call Option Data 36512 Heston Implied Volatility Data 37513 Rough Heston Call Option Data 37514 Rough Heston Implied Volatility Data 38

52 Data Pre-processing 39521 Data Splitting Train - Validate - Test 39522 Data Standardisation and Normalisation 39523 Data Partitioning 40

6 Neural Networks Architecture and Experimental Setup 4161 Neural Networks Architecture 41

611 ANN Architecture Option Pricing under Heston Model 43612 ANN Architecture Implied Volatility Surface of Hes-

ton Model 44613 ANN Architecture Option Pricing under Rough Hes-

ton Model 45614 ANN Architecture Implied Volatility Surface of Rough

Heston Model 4562 Performance Metrics and Validation Methods 4563 Implementation of ANN in Keras 4764 Summary of Neural Networks Architecture 48

7 Results and Discussions 4971 Heston Option Pricing ANN Results 4972 Heston Implied Volatility ANN Results 5173 Rough Heston Option Pricing ANN Results 5374 Rough Heston Implied Volatility ANN Results 5575 Run-time Performance 59

8 Conclusions and Outlook 61

A pybind11 A step-by-step tutorial with examples 63A1 Software Prerequisites 63A2 Create a Python Project 64A3 Create a C++ Project 66A4 Convert the C++ project to extension for Python 67A5 Make the DLL available to Python 68A6 Call the DLL from Python 70

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 5: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

vii

AcknowledgementsI would like to take a moment to thank those who helped me throughout thewriting of this dissertation

First I wish to thank my supervisor Dr Daniel Duffy for his invaluablesupport and expertise guidance I would also like to thank Dr Colin Rowatfor designing such a challenging and rigorous course I truly enjoyed it

I would also like to take this opportunity to acknowledge the support andgreat love of my family my girlfriend and my friends They kept me goingon and this work would not have been possible without them

ix

Contents

Declaration of Authorship iii

Abstract v

Acknowledgements vii

0 Projectrsquos Overview 101 Projectrsquos Overview 1

1 Introduction 511 Organisation of This Thesis 6

2 Literature Review 721 Application of ANN in Finance 722 Option Pricing Models 823 The Research Gap 8

3 Option Pricing Models 1131 The Heston Model 11

311 Option Pricing Under Heston Model 12312 Heston Model Implied Volatility Approximation 14

32 Volatility is Rough 1733 The Rough Heston Model 18

331 Rough Heston Pricing Equation 193311 Rational Approximation of Rough Heston Ric-

cati Solution 203312 Option Pricing Using Fast Fourier Transform 22

332 Rough Heston Model Implied Volatility 23

4 Artificial Neural Networks 2541 Dissecting the Artificial Neural Networks 25

411 Neurons 26412 Input Output and Hidden Layers 26413 Connections Weights and Biases 26414 Forward and Backward Propagation Loss Functions 27415 Activation Functions 27416 Optimisation Algorithms and Learning Rate 28417 Epochs Early Stopping and Batch Size 29

42 Feedforward Neural Networks 30421 Deep Neural Networks 30

x

43 Common Pitfalls and Remedies 30431 Loss Function Stagnation 31432 Loss Function Fluctuation 31433 Over-fitting 31434 Under-fitting 32

44 Application in Finance ANN Image-based Implicit Methodfor Implied Volatility Approximation 32

5 Data 3551 Data Generation and Storage 35

511 Heston Call Option Data 36512 Heston Implied Volatility Data 37513 Rough Heston Call Option Data 37514 Rough Heston Implied Volatility Data 38

52 Data Pre-processing 39521 Data Splitting Train - Validate - Test 39522 Data Standardisation and Normalisation 39523 Data Partitioning 40

6 Neural Networks Architecture and Experimental Setup 4161 Neural Networks Architecture 41

611 ANN Architecture Option Pricing under Heston Model 43612 ANN Architecture Implied Volatility Surface of Hes-

ton Model 44613 ANN Architecture Option Pricing under Rough Hes-

ton Model 45614 ANN Architecture Implied Volatility Surface of Rough

Heston Model 4562 Performance Metrics and Validation Methods 4563 Implementation of ANN in Keras 4764 Summary of Neural Networks Architecture 48

7 Results and Discussions 4971 Heston Option Pricing ANN Results 4972 Heston Implied Volatility ANN Results 5173 Rough Heston Option Pricing ANN Results 5374 Rough Heston Implied Volatility ANN Results 5575 Run-time Performance 59

8 Conclusions and Outlook 61

A pybind11 A step-by-step tutorial with examples 63A1 Software Prerequisites 63A2 Create a Python Project 64A3 Create a C++ Project 66A4 Convert the C++ project to extension for Python 67A5 Make the DLL available to Python 68A6 Call the DLL from Python 70

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 6: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

ix

Contents

Declaration of Authorship iii

Abstract v

Acknowledgements vii

0 Projectrsquos Overview 101 Projectrsquos Overview 1

1 Introduction 511 Organisation of This Thesis 6

2 Literature Review 721 Application of ANN in Finance 722 Option Pricing Models 823 The Research Gap 8

3 Option Pricing Models 1131 The Heston Model 11

311 Option Pricing Under Heston Model 12312 Heston Model Implied Volatility Approximation 14

32 Volatility is Rough 1733 The Rough Heston Model 18

331 Rough Heston Pricing Equation 193311 Rational Approximation of Rough Heston Ric-

cati Solution 203312 Option Pricing Using Fast Fourier Transform 22

332 Rough Heston Model Implied Volatility 23

4 Artificial Neural Networks 2541 Dissecting the Artificial Neural Networks 25

411 Neurons 26412 Input Output and Hidden Layers 26413 Connections Weights and Biases 26414 Forward and Backward Propagation Loss Functions 27415 Activation Functions 27416 Optimisation Algorithms and Learning Rate 28417 Epochs Early Stopping and Batch Size 29

42 Feedforward Neural Networks 30421 Deep Neural Networks 30

x

43 Common Pitfalls and Remedies 30431 Loss Function Stagnation 31432 Loss Function Fluctuation 31433 Over-fitting 31434 Under-fitting 32

44 Application in Finance ANN Image-based Implicit Methodfor Implied Volatility Approximation 32

5 Data 3551 Data Generation and Storage 35

511 Heston Call Option Data 36512 Heston Implied Volatility Data 37513 Rough Heston Call Option Data 37514 Rough Heston Implied Volatility Data 38

52 Data Pre-processing 39521 Data Splitting Train - Validate - Test 39522 Data Standardisation and Normalisation 39523 Data Partitioning 40

6 Neural Networks Architecture and Experimental Setup 4161 Neural Networks Architecture 41

611 ANN Architecture Option Pricing under Heston Model 43612 ANN Architecture Implied Volatility Surface of Hes-

ton Model 44613 ANN Architecture Option Pricing under Rough Hes-

ton Model 45614 ANN Architecture Implied Volatility Surface of Rough

Heston Model 4562 Performance Metrics and Validation Methods 4563 Implementation of ANN in Keras 4764 Summary of Neural Networks Architecture 48

7 Results and Discussions 4971 Heston Option Pricing ANN Results 4972 Heston Implied Volatility ANN Results 5173 Rough Heston Option Pricing ANN Results 5374 Rough Heston Implied Volatility ANN Results 5575 Run-time Performance 59

8 Conclusions and Outlook 61

A pybind11 A step-by-step tutorial with examples 63A1 Software Prerequisites 63A2 Create a Python Project 64A3 Create a C++ Project 66A4 Convert the C++ project to extension for Python 67A5 Make the DLL available to Python 68A6 Call the DLL from Python 70

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 7: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

x

43 Common Pitfalls and Remedies 30431 Loss Function Stagnation 31432 Loss Function Fluctuation 31433 Over-fitting 31434 Under-fitting 32

44 Application in Finance ANN Image-based Implicit Methodfor Implied Volatility Approximation 32

5 Data 3551 Data Generation and Storage 35

511 Heston Call Option Data 36512 Heston Implied Volatility Data 37513 Rough Heston Call Option Data 37514 Rough Heston Implied Volatility Data 38

52 Data Pre-processing 39521 Data Splitting Train - Validate - Test 39522 Data Standardisation and Normalisation 39523 Data Partitioning 40

6 Neural Networks Architecture and Experimental Setup 4161 Neural Networks Architecture 41

611 ANN Architecture Option Pricing under Heston Model 43612 ANN Architecture Implied Volatility Surface of Hes-

ton Model 44613 ANN Architecture Option Pricing under Rough Hes-

ton Model 45614 ANN Architecture Implied Volatility Surface of Rough

Heston Model 4562 Performance Metrics and Validation Methods 4563 Implementation of ANN in Keras 4764 Summary of Neural Networks Architecture 48

7 Results and Discussions 4971 Heston Option Pricing ANN Results 4972 Heston Implied Volatility ANN Results 5173 Rough Heston Option Pricing ANN Results 5374 Rough Heston Implied Volatility ANN Results 5575 Run-time Performance 59

8 Conclusions and Outlook 61

A pybind11 A step-by-step tutorial with examples 63A1 Software Prerequisites 63A2 Create a Python Project 64A3 Create a C++ Project 66A4 Convert the C++ project to extension for Python 67A5 Make the DLL available to Python 68A6 Call the DLL from Python 70

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 8: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

xi

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71A8 Example Store Data Generated by Analytical Heston Model

as NumPy Arrays 75

B Asymptotic Expansions for General Stochastic Volatility Models 81B1 Asymptotic Expansions 81

C Deep Neural Networks Library in Python Keras 83

Bibliography 85

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 9: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

xiii

List of Figures

1 Implementation Flow Chart 22 Data flow diagram for ANN on Rough Heston Pricer 3

11 Problem Solving Process 6

31 Paths of fBm for different values of H Adapted from (Shevchenko2015) 18

32 Implied Volatility Smiles of SPX as of 14th August 2013 withmaturity T in years Red and blue dots represent bid and askSPX implied volatilities green plots are from the calibratedrough Heston model dashed orange lines are from the clas-sical Heston model calibrated to these smiles Adapted from(Euch et al 2019) 24

41 A Simple Neural Network 2542 The pixels of implied volatility surface are represented by the

outputs of neural network 33

51 shows how the data is partitioned for 5-fold cross validationAdapted from (Chapter 2 Modeling Process k-fold cross validation) 40

61 Typical ANN Architecture for Option Pricing Application 4262 Typical ANN Architecture for Implied Volatility Surface Ap-

proximation 43

71 Plot of MSE Loss Function vs No of Epochs for Heston OptionPricing ANN 49

72 Plot of Predicted vs Actual values for Heston Option PricingANN 50

73 Plot of MSE Loss Function vs No of Epochs for Heston Im-plied Volatility ANN 51

74 Heston Implied Volatility Smiles ANN Predictions vs ActualValues 52

75 Mean Absolute Error of Full Heston Implied Volatility SurfaceANN Predictions on Unseen Data 53

76 Plot of MSE Loss Function vs No of Epochs for Rough HestonOption Pricing ANN 54

77 Plot of Predicted vs Actual values for Rough Heston PricingANN 55

78 Plot of MSE Loss Function vs No of Epochs for Rough HestonImplied Volatility ANN 56

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 10: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

xiv

79 Rough Heston Implied Volatility Smiles ANN Predictions vsActual Values 57

710 Mean Absolute Error of Full Rough Heston Implied VolatilitySurface ANN Predictions on Unseen Data 58

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 11: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

xv

List of Tables

51 Heston Model Call Option Data 3652 Heston Model Implied Volatility Data 3753 Rough Heston Model Call Option Data 3854 Heston Model Implied Volatility Data 38

61 Summary of Neural Networks Architecture for Each Application 48

71 Error Metrics of Heston Option Pricing ANN Predictions onUnseen Data 50

72 5-fold Cross Validation Results of Heston Option Pricing ANN 5073 Accuracy Metrics of Heston Implied Volatility ANN Predic-

tions on Unseen Data 5174 5-fold Cross Validation Results of Heston Implied Volatility

ANN 5375 Accuracy Metrics of Rough Heston Option Pricing ANN Pre-

dictions on Unseen Data 5476 5-fold Cross Validation Results of Rough Heston Option Pric-

ing ANN 5577 Accuracy Metrics of Rough Heston Implied Volatility ANN

Predictions on Unseen Data 5678 5-fold Cross Validation Results of Rough Heston Implied Volatil-

ity ANN 5879 Training Time 59710 Execution Time for Predictions using ANN and Traditional

Methods 59

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 12: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

xvii

List of Abbreviations

NN Neural NetworksANN Artificial Neural NetworksDNN Deep Neural NetworksCNN Convolutional Neural NetworksSGD Stochastic Gradient DescentSV Stochastic VolatilityCIR Cox Ingersoll and RossfBm fractional Brownian motionFSV Fractional Stochastic VolatilityRFSV Rough Fractional Stochastic VolatilityFFT Fast Fourier TransformELU Exponential Linear UnitReLU Rectified Linear Unit

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 13: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

1

Chapter 0

Projectrsquos Overview

This chapter intends to give a brief overview of the methodology of theproject without delving too much into the details

01 Projectrsquos Overview

It was advised by the supervisor to divide the problem into smaller achiev-able steps like (Mandara 2019) did The 4 main steps are

bull Define the objective

bull Formulate the procedures

bull Implement the procedures

bull Evaluate the results

The objective is to investigate the performance of artificial neural net-works (ANN) on vanilla option pricing and implied volatility approximationunder the rough Heston model The performance metrics in this context areaccuracy and run-time performance

Figure 1 shows the flow chart of implementation procedures The proce-dure starts with synthetic data generation Then we split the synthetic dataset into 3 portions - one for training (in-sample) one for validation and thelast one for testing (out-of-sample) Before the data is fed into the ANN fortraining it requires to be scaled or normalised as it has different magnitudeand range In this thesis we use the training method proposed by (Horvathet al 2019) the image-based implicit training approach for implied volatilitysurface approximation Then we test the trained ANN model with unseendata set to evaluate its accuracy

The run-time performance of ANN will be compared with that from thetraditional approximation methods This project requires the use of two pro-gramming languages C++ and Python The data generation process is donein C++ The wide range of machine learning libraries (Keras TensorFlowPyTorch) in Python makes it the logical choice for the neural networks com-putations The experiment starts with the classical Heston model as a conceptvalidation first and then proceed to the rough Heston model

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 14: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

2 Chapter 0 Projectrsquos Overview

Option Pricing Models Heston rough Heston

Data Generation

Data Pre-processing

Strike Price Maturity ampVolatility etc

Scaled Normalised Data(Training Set)

ANN ArchitectureDesign ANN Training

ANN ResultsEvaluation

Conclusions

ANN Prediction

Trained ANN

Accuracy (Option Price amp ImpliedVol) and Runtime Performance

FIGURE 1 Implementation Flow Chart

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 15: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

01 Projectrsquos Overview 3

Input

RoughHestonPricer

KerasANNTrainer

ANNPredictions

AccuracyEvaluation

RoughHestonData

ANNModelWeights

modelparameters

modelparametersoptionprice

traindata

ANNModelHyperparameters

trainedmodelweights

trainedmodelweights

predictedoptionprice

testdata(optionprice)

Output

errormetrics

FIGURE 2 Data flow diagram for ANN on Rough HestonPricer

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 16: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …

5

Chapter 1

Introduction

Option pricing has always been a major research area in financial mathemat-ics Various methods and techniques have been developed to price optionsmore accurately and more quickly since the introduction of the well-knownBlack-Scholes model in 1973 (Black et al 1973) The Black-Scholes model hasan analytical solution and has only one parameter volatility required to becalibrated This makes it very easy to implement Its simplicity comes at theexpense of many unrealistic assumptions One of its biggest flaws is the as-sumption of constant volatility term This assumption simply does not holdin the real market With this assumption the Black-Scholes model fails tocapture the volatility dynamics exhibited by the market and so is unable tovalue option accurately

In 1993 (Heston 1993) developed a stochastic volatility model in an at-tempt to better capture the volatility dynamics The Heston model assumesthe volatility of an underlying asset to follow a geometric Brownian motion(Hurst parameter H = 12) Similarly the Heston model also has a closed-form analytical solution and this makes it a strong alternative to the Black-Scholes model

Whilst the Heston model is more realistic than the Black-Scholes modelthe generated option prices do not match the observed vanilla option pricesconsistently (Gatheral et al 2014) Recent research shows that more accurateoption prices can be estimated if the volatility process is modelled as a frac-tional Brownian motion with H lt 12 When H lt 12 the volatility processpath appears to be rougher than when H ge 12 and hence the volatility isdescribe as rough

(Euch et al 2019) introduce the rough volatility version of Heston modelwhich combines the best of both worlds However the calibration of roughHeston model relies on the notoriously slow Monte Carlo method due to itsnon-Markovian nature This obstacle is the reason that rough Heston strug-gles to find its way into industrial applications Although there exist manyapproximation methods for rough Heston model to help to solve this prob-lem we are interested in the feasibility of machine learning algorithms

Recent advancements of machine learning algorithms could potentiallyprovide an alternative means for this problem Specifically the artificial neu-ral networks (ANN) offers the best hope because it is capable of learningcomplex functional relationships between input and output data and thenmaking predictions instantly

6 Chapter 1 Introduction

In this project our objective is to investigate the accuracy and run-timeperformance of ANN on European call option price predictions and impliedvolatility approximations under the rough Heston model We start off byusing the Heston model as a concept validation then proceed to its roughvolatility version For regulatory purposes we use data simulated by optionpricing models for training

11 Organisation of This Thesis

This thesis is organised as follows In Chapter 2 we review some of the re-lated work and provide justifications for our research In Chapter 3 we dis-cuss the theory of option pricing models of our interest We provide a briefintroduction of ANN and some techniques to improve their performance inChapter 4 Then we explain the tools we used for the data generation pro-cess and some useful techniques to speed up the process in Chapter 5 In thesame chapter we also discuss about the important data pre-processing proce-dures Chapter 6 shows our network architecture and experimental setupsWe present the results and discuss our findings in Chapter 7 Finally weconclude our findings and discuss about potential future work in Chapter 8

(START)Problem

RoughHestonModelLowIndustrialApplicationDespiteGivingAccurate

OptionPrices

RootCauseSlowCalibration(PricingampImpliedVol

Approximation)Process

PotentialSolutionApplyANNtoPricingandImplied

VolatilityApproximation

ImplementationANNComputationUsingKerasin

Python

EvaluationAccuracyandSpeedofANNPredictions

(END)Conclusions

ANNIsFeasibleorNot

FIGURE 11 Problem Solving Process

7

Chapter 2

Literature Review

In this chapter we intend to provide a review of the current literature andhighlight the research gap The focus of our review is the application of ANNin option pricing models We justify our methodology and our choice ofoption pricing models We would also provide some critiques

21 Application of ANN in Finance

Option pricing has always been a hot topic in academia and the financialindustry Researchers have been looking for various means to price optionmore accurately and quickly As the computational technology advancesthe application of ANN has gained popularity in solving complex problemsin many different industries In quantitative finance the idea of utilisingANN for pricing derivatives comes from its ability to make fast and accuratepredictions once it is trained

The application of ANN in option pricing begun with (Hutchinson et al1994) The authors were the pioneers in applying ANN for pricing and hedg-ing derivatives Since then there are more than 100 research papers on thistopic It is also worth noting that complexity of the ANNrsquos architecture usedin the research papers increases over the years Whilst there has been muchresearch on the applications of ANN in option pricing there is only about10 of papers focus on implied volatility (Ruf et al 2020) The feasibility ofusing ANN for implied volatility approximation is equally as important asfor option pricing since they are both part of the calibration process Recentlythere are more research papers focus on implied volatility and calibrationsuch as (Mostafa et al 2008) (Liu et al 2019a) (Liu et al 2019b) and (Hor-vath et al 2019) A key contribution of (Horvath et al 2019)rsquos work is theintroduction of imaged-based implicit training method for implied volatilitysurface approximation This method is allegedly robust and fast for approx-imating the entire implied volatility surface

A significant portion of the research papers uses real market data in theirresearch Currently there is no standardised framework for deciding the ar-chitecture and parameters of ANN Hence training the ANN directly withmarket data might pose some issues when it comes to regulatory compli-ance (Horvath et al 2019) is one of the few papers that uses simulated datafor ANN training The simulated data is generated from Heston and roughBergomi models Using simulated data removes the regulatory concerns as

8 Chapter 2 Literature Review

the functional mapping from model parameters to option price is known andtractable

The application of ANN requires splitting the entire data set for trainingvalidation and testing Most of the papers that use real market data ensurethe data splitting process is chronological to preserve the time series structureof the data This is not a concern in our case we as are using simulated data

The results in (Horvath et al 2019) shows very accurate predictions How-ever after examining the source code we discover that the ANN is validatedand tested on the same data set Thus the high accuracy results does notgive any information about how well the trained ANN generalises to unseendata In our experiment we will use different data sets for validation andtesting

Common error metrics used include mean absolute error (MAE) meanabsolute percentage error (MAPE) and mean squared error (MSE) We sug-gest a new metric maximum absolute error as it is useful in a risk manage-ment perspective for quantifying the largest prediction error in the worst casescenario

22 Option Pricing Models

In terms of the choice of option pricing models over 80 of the papers selectBlack-Scholes model or its extensions in their research (Ruf et al 2020) Asmentioned before there are other models that can give better option pricesthan Black-Scholes such as the Stochastic Volatility models (Liu et al 2019b)investigated the application of ANN in Heston and Bates models

According to (Gatheral et al 2014) modelling the volatility as roughvolatility can capture the volatility dynamics in the market better Roughvolatility models struggle to find their ways into the industry due to theirslow calibration process Hence it is natural to for the direction of ANNresearch to move towards rough volatility models

Some of the recent examples include (Stone 2019) (Bayer et al 2018)and (Horvath et al 2019) (Stone 2019) investigates the calibration of roughBergomi model using Convolutional Neural Networks (CNN) whilst (Bayeret al 2018) study the calibration of Heston and rough Bergomi models us-ing Deep Neural Networks (DNN) Contrary to these papers where theyperform direct calibration via ANN in one step (Horvath et al 2019) splitthe calibration of rough Bergomi model into two steps Firstly the ANN istrained to learn the pricing equation during a so-called offline session Thenthe trained ANN is deterministic and stored for online calibration sessionlater This approach can speed up the online calibration by a huge margin

23 The Research Gap

In our thesis we will be using the the approach in (Horvath et al 2019) Thatis to train the network to learn the pricing and implied volatility functional

23 The Research Gap 9

mappings separately We would leave the model calibration step as the fu-ture outlook of this project We choose to work with rough Heston modelanother popular rough volatility model that has not been investigated yetWe would adapt the image-based implicit training method in our researchTo re-iterate we would validate and test our ANN with different data set toobtain a true measure of ANNrsquos performance on unseen data

11

Chapter 3

Option Pricing Models

In this chapter we discuss about the option pricing models of our interestHeston and rough Heston models The idea of modelling volatility as roughfractional Brownian motion will also be presented Then we derive theirpricing and implied volatility equations and also explain their implementa-tions for data generation

The Black-Scholes model is simple but unrealistic for real world applica-tions as its assumptions simply do not hold One of its greatest criticisms isthe assumption of constant volatility of the underlying asset price (Dupire1994) extended the Black-Scholes framework to give local volatility modelsHe modelled the volatility as a deterministic function of the underlying priceand time The function is selected such that its outputs match observed Euro-pean option prices This extension is time-inhomogeneous and the dynamicsit exhibits is still unrealistic Thus it is not able to produce future volatilitysurfaces that emulate our observations

Stochastic Volatility (SV) models were introduced in which the volatilityof the underlying is modelled as a geometric Brownian motion Recent re-search shows that rough volatility can capture the true volatility dynamicsbetter and so this leads to the rough volatility models

31 The Heston Model

One of the most popular SV models is the one proposed by (Heston 1993) itspopularity is due to the fact that its pricing equation for European options isanalytically tractable The property allows calibration procedures to be doneefficiently

The Heston model for a one-dimensional spot price S follows a stochasticprocess

dS = microSdt +radic

νSdW1 (31)

And its instantaneous variance ν follows a Cox Ingersoll and Ross (CIR)process (Cox et al 1985)

dν = κ(θ minus ν)dt + ξradic

νdW2 (32)〈dW1 dW2〉 = ρdt (33)

where

12 Chapter 3 Option Pricing Models

bull micro is the (deterministic) rate of return of the spot

bull θ is the long term mean volatility

bull ξ is the volatility of volatility

bull κ is the mean reversion rate of volatility

bull ρ is the correlation between spot and volatility moves

bull W1 and W2 are Wiener processes

311 Option Pricing Under Heston Model

In this section we follow (Gatheral 2006) closely We start by forming aportfolio Π containing option being priced V number of stocks minus∆ and aquantity of minus∆1 of another asset whose value is V1

Π = V minus ∆Sminus ∆1V1

The change in portfolio during dt is

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2Vpartν2partS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

+

[partVpartSminus ∆1

partV1

partSminus ∆

]dS +

[partVpartνminus ∆1

partV1

partν

]dν

Note the explicit dependence on t of S and ν has been removed for simplicityWe can make the portfolio instantaneously risk-free by eliminating dS and

dν termspartVpartSminus ∆1

partV1

partSminus ∆ = 0

AndpartVpartνminus ∆1

partV1

partν= 0

So we are left with

dΠ =

[partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2

]dt

minus ∆1

[partV1

partt+

12

νS2 part2V1

partS2 + ρξνSpart2V1

partνpartS+

12

ξ2νpart2V1

partν2

]dt

=rΠdt=r(V minus Sminus ∆1V1)dt

31 The Heston Model 13

Then we apply the no arbitrage principle the return of a risk-free portfoliomust be equal to the risk-free rate And we assume the risk-free rate to bedeterministic for our case

Rearranging the above equation by collecting V terms on the left-handside and V1 terms on the right-hand side

partVpartt + 1

2 νS2 part2VpartS2 + ρξνS part2V

partνpartS + 12 ξ2ν part2V

partν2 + rS partVpartS minus rV

partVpartν

=partV1partt + 1

2 νS2 part2V1partS2 + ρξνS part2V1

partνpartS + 12 ξ2ν part2V1

partν2 + rS partV1partS minus rV1

partV1partν

It is obvious that the ratio is equal to some function of S ν and t f (S ν t)Thus we have

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = f

partVpartν

In the case of Heston model f is chosen as

f = minus(κ(θ minus ν)minus λradic

ν)

where λ(S ν t) is called the market price of volatility risk We do not delveinto the choice of f and the concept of λ(S ν t) any further here See (Wilmott2006) for his arguments

(Heston 1993) assumed in his paper that λ(S ν t) is linear in the instanta-neous variance νt to retain the form of the equation under the transformationfrom the statistical measure to the risk-neutral measure Here we are onlyinterested in pricing the options so we can set λ(S ν t) to be zero (Gatheral2006)

Hence we arrived at

partVpartt

+12

νS2 part2VpartS2 + ρξνS

part2VpartνpartS

+12

ξ2νpart2Vpartν2 + rS

partVpartSminus rV = κ(νminus θ)

partVpartν

(34)

According to (Heston 1993) the price of an undiscounted European calloption price C with strike price K and time to maturity T has solution in theform

C(S K ν T) =12(Fminus K) +

int infin

0(F lowast f1 minus K lowast f2)du (35)

14 Chapter 3 Option Pricing Models

where

f1 = Re

[eminusiu ln K ϕ(uminus i)

iuF

]

f2 = Re

[eminusiu ln K ϕ(u)

iu

]ϕ(u) = E(eiu ln ST)

F = SemicroT

The evaluation of 35 requires the computation of Fourier inversion in-tegrals And it is susceptible to numerical instabilities due to the involvedcomplex logarithms (Kahl et al 2006) proposed an integration scheme tocompute the Fourier inversion integral by applying some transformations tothe asymptotic structure Thus the solution becomes

C(S K ν T) =int 1

0y(x)dx x isin R (36)

where

y(x) =12(Fminus K) +

F lowast f1

(minus ln x

Cinfin

)minus K lowast f2

(minus ln x

Cinfin

)x lowast π lowast Cinfin

The limits at the boundaries of the integral are

limxrarr0

y(x) =12(Fminus K)

and

limxrarr1

y(x) =12(Fminus K) +

F lowast limurarr0 f1(u)minus K lowast limurarr0 f2(u)π lowast Cinfin

Equation 36 provides a more robust pricing method for moderate to longmaturities or strong mean-reversion options For the full derivation of equa-tion 36 see (Kahl et al 2006) The source code to implement equation 36was provided by the supervisor and is the source code for (Duffy et al 2012)

312 Heston Model Implied Volatility Approximation

Before the model is used for pricing options the modelrsquos parameters need tobe calibrated so that it can return current market prices (at least to some de-gree of accuracy) That is the unmeasurable model parameters are obtainedby calibrating to implied volatility that are observed on the market Hencethis motivates the need for fast implied volatility computation

Unfortunately there is no closed-form analytical formula for calculatingimplied volatility Heston model There are however many closed-form ap-proximations for Heston implied volatility

In this project we select the implied volatility approximation methodfor Heston proposed by (Lorig et al 2019) that allegedly outperforms other

31 The Heston Model 15

methods such as the one by (Fouque et al 2012) (Lorig et al 2019) derivea family of asymptotic expansions for European-style option implied volatil-ity Their method provides an explicit approximation and requires no specialfunctions not even numerical integration and so it is faster to compute thanthe counterparts

We follow closely the derivation steps outlined in (Lorig et al 2019) Con-sider the Heston model in Section (31) According to (Andersen et al 2007)ρ must be set to negative to avoid moment explosion To circumvent this weapply the following change of variables (X(t) V(t)) = (ln S(t) eκtν) Usingthe asymptotic expansions for general stochastic models (see Appendix B)and Itorsquos formula we have

dX = minus12

eminusκtVdt +radic

eminusκtVdW1 X(0) = x = ln s (37)

dV = θκeκtdt + ξradic

eκtVdW2 V(0) = ν = z gt 0 (38)〈dW1 dW2〉 = ρdt (39)

The parameters ν κ θ ξ W1 W2 play the same role as in Section (31)Then we apply the second order differential operator A(t) to (X V)

A(t) = 12

eminusκtv(part2x minus partx) + θκeκtpartv +

12

ξ2ξeκtvpart2v + ξρνpartxpartv

Define T as time to maturity k as log-strike Using the time-dependentTaylor series expansion ofA(T) with (x(T) v(T)) = (X0 E[V(T)]) = (x θ(eκTminus1)) to give (see Appendix B)

σ0 =

radicminusθ + θκT + eminusκT(θ minus v) + v

κT

σ1 =ξρzeminusκT(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)

radic2κ2σ2

0 T32

σ2 =ξ2eminus2κT

32κ4σ50 T3

[minus2radic

2κσ30 T

32 z(minusvminus 4eκT(v + κT(vminus v)) + e2κT(v(5minus 2κT)minus 2v) + 2v)

+ κσ20 T(4z2 minus 2)(v + e2κT(minus5v + 2vκT + 8ρ2(v(κT minus 3) + v) + 2v))

+ κσ20 T(4z2 minus 2)(4eκT(v + vκT + ρ2(v(κT(κT + 4) + 6)minus v(κT(κT + 2) + 2))

minus κTv)minus 2v)

+ 4radic

2ρ2σ0radic

Tz(2z2 minus 3)(minus2minus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2

+ 4ρ2(4(z2 minus 3)z2 + 3)(minus2vminus vκT minus eκT(v(κT minus 2) + v) + κTv + v)2]

minusσ2

1 (4(xminus k)2 minus σ40 T2)

8σ30 T

where

z =xminus kminus σ2

0 T2

σ0radic

2T

16 Chapter 3 Option Pricing Models

The explicit expression for σ3 is too long to be included here See (Loriget al 2019) for the full derivation The explicit expression for the impliedvolatility approximation is

σimplied = σ0 + σ1z + σ2z2 + σ3z3 (310)

The C++ source code to implement this is available here Heston ImpliedVolatility Approximation

32 Volatility is Rough 17

32 Volatility is Rough

We start this section by introducing the fractional Brownian motion (fBm)which is a generalisation of Brownian motion

Definition 321 Fractional Brownian Motion (fBm) is a centered Gaussian processBH

t t ge 0 with the autocovariance function

E[BHt BH

s ] =12(|t|2H + |s|2H minus |tminus s|2H) (311)

where H isin (0 1) is the Hurst index or Hurst parameter associated with the frac-tional Brownian motion

The Hurst parameter is a measure of the long-term memory of a timeseries It was first introduced by (Hurst 1951) for modelling water levels ofthe Nile river in order to determine the optimum dam size As it turns outit is also suitable for modelling time series data from other natural systems(Peters 1991) and (Peters 1994) are some of the earliest examples of applyingthis concept in financial time series modelling A time series can be classifiedinto 3 categories based on the Hurst parameter

1 0 lt H lt 12 Negative correlation in the incrementdecrement of the

process A mean reverting process which implies an increase in valueis more likely followed by a decrease in value and vice versa Thesmaller the value the greater the mean-reverting effect

2 H = 12 No correlation in the incrementdecrement of the process (geo-

metric Brownian motion or Wiener process)

3 12 lt H lt 1 Positive correlation in the incrementdecrement of theprocess A trend reinforcing process which means the direction of thenext value is more likely the same as current value The strength of thetrend is greater when H is closer to 1

Empirical evidence shows that log-volatility time series behaves similarto a fBm with Hurst parameter of order 01 (Gatheral et al 2014) Essentiallyall the statistical stylised facts of volatility can be captured when modellingit as a rough fBm

This motivates the use of the fractional stochastic volatility (FSV) modelby (Comte et al 1998) Our model is called rough fractional stochastic volatil-ity (RFSV) model to emphasise that H lt 12 The term rsquoroughrsquo refers to theroughness of the path of fBm From Figure 31 one can notice that the fBmpath gets rsquorougherrsquo as H decreases

18 Chapter 3 Option Pricing Models

FIGURE 31 Paths of fBm for different values of H Adaptedfrom (Shevchenko 2015)

33 The Rough Heston Model

In this section we introduce the rough volatility version of Heston model andderive its pricing and implied volatility equations

As stated before rough volatility models can fit the volatility surface no-tably well with very few parameters Hence it is natural to modify the Hes-ton model and consider its rough version

The rough Heston model for a one-dimensional asset price S with instan-taneous variance ν(t) takes the form as in (Euch et al 2016)

dS = Sradic

νdW1

ν(t) = ν(0) +1

Γ(α)

int t

0(tminus s)αminus1κ(θminus ν(s))ds +

ξ

Γ(α)

int t

0(tminus s)αminus1

radicν(s)dW2

(312)〈dW1 dW2〉 = ρdt

where

bull α = H + 12 isin (12 1) governs the roughness of the volatility sample

paths

bull κ is the mean reversion rate

bull θ is the long term mean volatility

bull ν(0) is the initial volatility

33 The Rough Heston Model 19

bull ξ is the volatility of volatility

bull Γ is the Gamma function

bull W1 and W2 are two Brownian motions with correlation ρ

When α = 1 (H = 05) we recover the classical Heston model in Chapter(31)

331 Rough Heston Pricing Equation

The pricing equation of rough Heston model is inspired by the classical Hes-ton one In classical Heston model option price can be obtained by applyingFourier inversion integrals to the characteristic function that is expressed interms of the solution of a Riccati equation The rough Heston model dis-plays a similar structure but with the Riccati equation being replaced by afractional Riccati equation

(Euch et al 2016) derived a characteristic function for the rough Hestonmodel this result is particularly important as the non-Markovian nature ofthe fractional Brownian motion makes other numerical pricing methods (egMonte Carlo) difficult to implement

Definition 331 The fractional integral of order r isin (0 1] of a function f is

Ir f (t) =1

Γ(r)

int t

0(tminus s)rminus1 f (s)ds (313)

whenever the integral exists and the fractional derivative of order r isin (0 1] is

Dr f (t) =1

Γ(1minus r)ddt

int t

0(tminus s)minusr f (s)ds (314)

whenever it exists

Then we quote the results from (Euch et al 2016) directly For the fullproof see Section 6 of (Euch et al 2016)

Theorem 331 For all t ge 0 the characteristic function of the log-price Xt =

ln S(t)S(0) is

φX(a t) = E[eiaXt ] = exp[κθ I1h(a t) + ν(0)I1minusαh(a t)] (315)

where h is solution of the fractional Riccati equation

Dαh(a t) = minus12

a(a + i) + κ(iaρξ minus 1)h(a t) +(ξ)2

2h2(a t) I1minusαh(a 0) = 0

(316)which admits a unique continuous solution a isin C a suitable region in the complexplane (see Definition 332)

20 Chapter 3 Option Pricing Models

Definition 332 We define the strip in the complex plane relevant to the computa-tion of option prices characteristic function in Theorem (331)

C =

z isin C lt(z) ge 0minus 11minus ρ2 le =(z) le 0

(317)

where lt and = denote real and imaginary parts respectively

There is no explicit solution to equation (316) so it has to be solved nu-merically (Gatheral et al 2019) present a simple rational approximation tothe solution of the rough Heston Riccati equation which is inevitably fasterthan the numerical schemes

3311 Rational Approximation of Rough Heston Riccati Solution

Firstly the short-time series expansion of the solution by (Alos et al 2017)can be written as

hs(a t) =infin

sumj=0

Γ(1 + jα)Γ[1 + (j + 1)α]

β j(a)ξ jt(j+1)α (318)

with

β0(a) =minus 12

a(a + i)

βk(a) =12

kminus2

sumij=0

1i+j=kminus2 βi(a)β j(a)Γ(1 + iα)

Γ[1 + (i + 1)α]Γ(1 + jα)

Γ[1 + (j + 1)α]

+ iρaΓ[1 + (kminus 1)α]

Γ(1 + kα)βkminus1(a)

Again from (Alos et al 2017) the long-time expansion is

hl(a t) = rminusinfin

sumk=0

γkξtminuskα

AkΓ(1minus kα)(319)

where

γ1 = minusγ0 = minus1

γk = minusγkminus1 +rminus2A

infin

sumij=1

1i+j=k γiγjΓ(1minus kα)

Γ(1minus iα)Γ(1minus jα)

A =radic

a(a + i)minus ρ2a2

rplusmn = minusiρaplusmn A

Now we can construct the global approximation Using the same tech-niques in (Atkinson et al 2011) the rational approximation of h(a t) is given

33 The Rough Heston Model 21

by

h(mn)(a t) =

msum

i=1pi(ξtα)i

nsum

j=0qj(ξtα)j

q0 = 1 (320)

Numerical experiments in (Gatheral et al 2019) showed h(33) is a goodchoice for high accuracy and fast computation Thus we can express explic-itly

h(33)(a t) =p1ξ(tα) + p2ξ2(tα)2 + p3ξ3(tα)3

1 + q1ξ(tα) + q2ξ2(tα)2 + q3ξ3(tα)3 (321)

Thus from equations (318) and (319) we have

hs(a t) =Γ(1)

Γ(1 + α)β0(a)(tα) +

Γ(1 + α)

Γ(1 + 2α)β1(a)ξ(tα)2

+Γ(1 + 2α)

Γ(1 + 3α)β2(a)ξ2(tα)3 +O(t4α)

hs(a t) = b1(tα) + b2(tα)2 + b3(tα)3 +O[(tα)4]

and

hl(a t) =rminusγ0

Γ(1)+

rminusγ1

AΓ(1minus α)ξ

1(tα)

+rminusγ2

A2Γ(1minus 2α)ξ21

(tα)2 +O[

1(tα)3

]hl(a t) = g0 + g1

1(tα)

+ g21

(tα)2 +O[

1(tα)3

]Matching the coefficients of equation (321) to hs and hl forces the coeffi-

cients bi and gj to satisfy

p1ξ = b1 (322)

(p2 minus p1q1)ξ2 = b2 (323)

(p1q21 minus p1q2 minus p2q1 + p3)ξ

3 = b3 (324)

p3ξ3

q3ξ3 = g0 (325)

(p2q3 minus p3q2)ξ5

(q23)ξ

6= g1 (326)

(p1q23 minus p2q2q3 minus p3q1q3 + p3q2

2)ξ7

(q33)ξ

9= g2 (327)

22 Chapter 3 Option Pricing Models

The solution of this system is

p1 =b1

ξ(328)

p2 =b3

1g1 + b21g2

0 + b1b2g0g1 minus b1b3g0g2 + b1b3g21 + b2

2g0g2 minus b22g2

1 + b2g30

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2

(329)

p3 = g0q3 (330)

q1 =b2

1g1 minus b1b2g2 + b1g20 minus b2g0g1 minus b3g0g2 + b3g2

1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

(331)

q2 =b2

1g0 minus b1b2g1 minus b1b3g2 + b22g2 + b2g2

0 minus b3g0g1

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

2(332)

q3 =b3

1 + 2b1b2g0 + b1b3g1 minus b22g1 + b3g2

0

(b21g2 + 2b1g0g1 + b2g0g2 minus b2g2

1 + g30)ξ

3(333)

3312 Option Pricing Using Fast Fourier Transform

After solving the Riccati equation using the rational approximation equation(321) we can compute the characteristic function φX(a t) in equation (315)Once the characteristic function is obtained many methods are available tocompute call option prices see (Carr and Madan 1999) (Itkin 2010) (Lewis2001) and (Schmelzle 2010) In this project we select the Carr and Madan ap-proach which is a fast option pricing method that uses fast Fourier transform(FFT) Suppose k = ln K and C(k T) is the value of a European call optionwith strike K and maturity T Unfortunately the FFT cannot be applied di-rectly to evaluate the call option price because the call pricing function isnot square-integrable A key contribution of (Carr and Madan 1999) is themodification of the call pricing function with respect to k With this modifica-tion a whole range of option prices can be obtained with one inverse Fouriertransform

Consider the modified call price c(k T)

c(k T) = eβkC(k T) β gt 0 (334)

And its Fourier transfrom is

ψX(u T) =int infin

minusinfineiukc(k T)dk u isin Rgt0 (335)

which can be expressed as

ψX(u T) =eminusrTφX[uminus (β + 1)i T]

β2 + βminus u2 + i(2β + 1)u(336)

where r is the risk-free rate φX() is the characteristic function defined inequation (315)

33 The Rough Heston Model 23

The purpose of β is to assist the integrability of the modified call pricingfunction over the negative log-strike axis Hence a sufficient condition is thatψX(0) being finite

ψX(0 T) lt infin (337)

=rArr φX[minus(β + 1)i T] lt infin (338)

=rArr exp[θκ I1h(minus(β + 1)i T) + ν(0)I1minusαh(minus(β + 1)i T)] lt infin (339)

Using equation (339) we can determine the upper bound value of β such thatcondition (337) is satisfied (Carr and Madan 1999) suggest one quarter ofthe upper bound serves a good choice for β

The call price C(k T) can be recovered by applying the inverse Fouriertransform

C(k T) =eminusβk

int infin

minusinfinlt[eminusiukψX(u T)]du (340)

=eminusβk

π

int infin

0lt[eminusiukψX(u T)]du (341)

The semi-infinite integral in the call price equation can be evaluated bynumerical methods such as the trapezoidal rule and FFT as shown in (Kwoket al 2012)

C(k T) asymp eminusβk

π

N

sumj=1

eminusiujkψX(uj T)∆u (342)

where uj = (jminus 1)∆u j = 1 2 NWe end this section with a summary of the steps required to price call

option under rough Heston model

1 Compute the solution of fractional Riccati equation (316) h using equa-tion (321)

2 Compute the characteristic function in equation (315)

3 Determine the suitable value for β using equation (339)

4 Apply the Fourier transform in equation (336)

5 Evaluate the call option price by implementing FFT in equation (342)

332 Rough Heston Model Implied Volatility

(Euch et al 2019) present an almost instantaneous and accurate approxi-mation method to compute rough Heston implied volatility This methodallows us to approximate rough Heston implied volatility by simply scalingthe volatility of volatility parameter ν and feed this scaled parameter as inputinto the classical Heston implied volatility approximation equation in Chap-ter 312 For a given maturity T The scaled volatility of volatility parameterν is

ν =

radic3

2H + 2ν

Γ(H + 32)

1

T12minusH

(343)

24 Chapter 3 Option Pricing Models

FIGURE 32 Implied Volatility Smiles of SPX as of 14th August2013 with maturity T in years Red and blue dots representbid and ask SPX implied volatilities green plots are from thecalibrated rough Heston model dashed orange lines are fromthe classical Heston model calibrated to these smiles Adapted

from (Euch et al 2019)

25

Chapter 4

Artificial Neural Networks

This chapter starts with an introduction to Artificial Neural Networks (ANN)We introduce the terminologies and basic elements of ANN and explain themethods that we used to increase training speed and predictive accuracy Wealso discuss about the training process for ANN common pitfalls and theirremedies Finally we discuss about the ANN type of interest and extend thediscussions to Deep Neural Networks (DNN)

The concept of ANN has come a long way It started off with modelling asimple neural network after electrical circuits which was proposed by War-ren McCulloch a neurologist and a young mathematician Walter Pitts in1943 (McCulloch et al 1943) This idea was further strengthen by DonaldHebb (Hebb 1949)

ANNrsquos ability to learn complex non-linear functional mappings by deci-phering the pattern between independent and dependent variables has foundits applications in many fields With the computational resources becomingmore accessible and the availability of big data set ANNrsquos popularity hasgrown exponentially in recent years

41 Dissecting the Artificial Neural Networks

FIGURE 41 A Simple Neural Network

Before we delve deeper into the world of ANN let us explain what doparameters and hyperparameters mean

26 Chapter 4 Artificial Neural Networks

Parameters refer to the variables that are updated throughout the train-ing process they include weights and biases Hyperparameters are the con-figuration variables that define the architecture of ANN and stay constantthroughout the training process Hyperparameters include number of neu-rons hidden layers activation functions and number of epochs etc

411 Neurons

Like a biological neuron an artificial neuron is the fundamental workingunit in an ANN It is essentially a mathematical function that receives mul-tiple inputs and generates an output More precisely it takes the weightedsum of inputs and applies the activation function to the sum to generate anoutput In the case of simple ANN (as shown in Figure 41) this will be thenetwork output In more sophisticated ANN the output of a neuron can bethe inputs of other neurons

412 Input Output and Hidden Layers

An input layer is the first layer of an ANN It receives and sends the trainingdata into the network for further processing The number of input neuronsis equal to the number of input features of training data

The output layer is the final layer of the network It outputs the predic-tions for a given set of input features It can have more than one outputneuron

A hidden layer is the layer between input and output layers It containsneuron(s) and receives weighted inputs from the input layer Whilst ANNcontains only one input layer and one output layer the number of hiddenlayers can be more than one One hidden layer can only be used for solvinglinearly separable problems For more complex problems multiple hiddenlayers are needed

Systematic experimentation is required to identify the appropriate num-ber of hidden layers to prevent under-fitting and over-fitting and thus in-creasing predictive accuracy

413 Connections Weights and Biases

From input layer to output layer each connection is assigned a weight thatdenotes its relative importance Random weights will be initialised whentraining begins As training continues these weights will be updated

Sometimes a bias (different from the bias term in statistics) will be addedto the neuron It is merely a constant input with weight 1 Intuitively speak-ing the purpose of a bias is to shift the activation function away from theorigin thereby allowing the neuron to give non-zero output when the inputsare 0

41 Dissecting the Artificial Neural Networks 27

414 Forward and Backward Propagation Loss Functions

Forward propagation is the passing of input data in the forward directionthrough the network to generate output Then the difference between outputand target output is estimated by a loss function It is a measure of how wellan ANN performs with the current set of weights Typical loss functions forregression problems include mean squared error (MSE) and mean absoluteerror (MAE) They are defined as

Definition 411

MSE =1n

n

sumi=1

(yipredict minus yiactual)2 (41)

Definition 412

MAE =1n

n

sumi=1|yipredict minus yiactual| (42)

where yipredict is the i-th predicted output and yiactual is the i-th actual outputSubsequently this information is propagated backward from output layer

to input layer and the gradients of the loss function wrt the weights is com-puted The algorithm then makes adjustments to the weights accordingly tominimise the loss function An iteration is one complete cycle of forward andbackward propagation

ANNrsquos training is an unconstrained optimisation problem with the lossfunction as the objective function The algorithm navigates through the searchspace to find optimal weights to minimise the loss function (eg MSE) It canbe expressed like so

minw

1n

n

sumi=1

(yipredict minus yiactual)2 (43)

where w is the vector that contains the weights of all the connections withinthe ANN

415 Activation Functions

An activation function is a mathematical function applied to the output of aneuron to determine whether it should be activated (or fired) or not basedon the relevance of the neuronrsquos inputs to the prediction This is analogousto the neuronal firing activity in humanrsquos brain

Activation functions can be linear or non-linear Linear activation func-tion is only suitable when the functional relationship is linear Non-linearactivation functions are used for most applications In our case non-linearactivation functions are more suitable as the pricing and implied volatilityfunctional mappings are complex

The common activation function includes

28 Chapter 4 Artificial Neural Networks

Definition 413 Rectified Linear Unit (ReLU)

fReLU(x) =

0 if x le 0x if x gt 0

isin [0 infin) (44)

Definition 414 Exponential Linear Unit (ELU)

fELU(x) =

α(ex minus 1) if x le 0x if x gt 0

isin (minusα infin) (45)

where α gt 0

The value of α is usually chosen to be 1 for most applications

Definition 415 Linear or identity

flinear(x) = x isin (minusinfin infin) (46)

416 Optimisation Algorithms and Learning Rate

Learning rate refers to the amount of change is applied to the weights in eachiteration of the training process High learning rate results in fast trainingspeed but there is risk of divergence whereas low learning rate may reducethe training speed

Common optimisation algorithms for training ANN are Stochastic Gra-dient Descent (SGD) and its extensions such as RMSProp and Adam TheSGD pseudocode is the following

Algorithm 1 Stochastic Gradient Descent

Initialise a vector of random weights w and learning rate lrwhile Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

wnew = wcurrent minus lr partLosspartwcurrent

end forend while

SGD is popular because it is simple to implement and can provide goodresults for most applications In SGD the learning rate lr is fixed throughoutthe process This is found to be an issue because the appropriate learning rateis difficult to determine Selecting a high learning rate is prone to divergencewhereas using a low learning rate can result in slow training process Henceimprovements have been made to SGD to allow the learning rate to be adap-tive Root Mean Square Propagation (RMSProp) is one of the the extensions

41 Dissecting the Artificial Neural Networks 29

that the learning rate to be variable throughout the training process (Hintonet al 2015) It is a method that divides the learning rate by the exponen-tial moving average of past gradients The author suggests the exponentialdecay rate to be b = 09 and its pseudocode is presented below

Algorithm 2 Root Mean Square Propagation (RMSProp)

Initialise a vector of random weights w learning rate lr v = 0 and b = 09while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

vcurrent = bvprevious + (1minus b)[ partLosspartwcurrent

]2

wnew = wcurrent minus lrradicvcurrent+ε

partLosspartwcurrent

end forend while

where ε is a small number to prevent division by zero

A further improved version was proposed by (Kingma et al 2015) calledthe Adaptive Moment Estimation (Adam) In RMSProp the algorithm adaptsthe learning rate based on the first moment only Adam improves this fur-ther by using the average of the second moments of the gradients to makeadaptation in the learning rate The advantage of Adam is that it can handlesparse gradients on noisy data The pseudocode of Adam is given below

Algorithm 3 Adaptive Moment Estimation (Adam)

Initialise a vector of random weights w learning rate lr v = 0 m = 0b = 09 and b1 = 0999while Loss gt error do

Randomly shuffle the training data of size nfor i = 1 2 middot middot middot n do

mcurrent = bmprevious + (1minus b)[ partLosspartwcurrent

]2

vcurrent = b1vprevious + (1minus b1)[partLoss

partwcurrent]2

mcurrent =mcurrent

1minusbcurrentvcurrent =

vcurrent1minusb1current

wnew = wcurrent minus lrradicvcurrent+ε

mcurrentpartLoss

partwcurrent

end forend while

where b is the exponential decay rate for the first moment and b1 is the expo-nential decay rate for the second moment

417 Epochs Early Stopping and Batch Size

Epoch is the number of times that the entire training data set is passed throughthe ANN Generally we want the number of epoch as high as possible How-ever this may cause the ANN to over-fit Early stopping would be useful

30 Chapter 4 Artificial Neural Networks

here to stop the training before the ANN becomes over-fit It works by stop-ping the training process when loss function shows no improvement Batchsize is the number of input samples processed before the hyperparametersare updated

42 Feedforward Neural Networks

Feedforward Neural Networks are the most common type of ANN in practi-cal applications They are called feed forward because the training data onlyflows from input layer to output layer there is no feedback loop within thenetwork

421 Deep Neural Networks

Deep Neural Networks (DNN) are Feedforward Neural Networks with morethan one hidden layer The depth here refers to the number of hidden layersThis the type of ANN of our interest and the following theorems providejustifications

Theorem 421 (Hornik et al 1989) Universal Approximation Theorem LetNN adindout

be the set of neural networks with activation function fa R rarr R input dimen-sion din isin N and output dimension dout isin N Then if fa is continuous andnon-constant NN a

dindoutis dense in Lebesgue space Lp(micro) for all finite measures micro

Theorem 422 (Hornik et al 1990) Universal Approximation Theorem for Deriva-tives Let the function to be approximated Flowast isin Cn the mapping of neural networkF Rd0 rarr R and NN a

dindoutbe the set of single-layer neural networks with ac-

tivation function fa R rarr R input dimension din isin N and output dimension1 Then if the activation function (non-constant) is fa isin Cn(R) then NN a

din1arbitrarily approximates Flowast and all its derivatives up to order n

Theorem 423 (Eldan et al 2016) The Power of depth of Neural Networks Thereexists a simple function on Rd expressible by a small 3-layer feedforward neuralnetworks which cannot be approximated by any 2-layer network to more than acertain constant accuracy unless its width is exponential in the dimension

According to Theorem 422 it is crucial to have an activation function thatis n-differentiable if the approximation of the nth-derivative of the functionFlowast is required Theorem 423 motivates the choice of deep network It statesthat the depth of network can improve the predictive capabilities of networkHowever it is worth noting that network deeper than 4 hidden layers doesnot provide much performance gain (Ioffe et al 2015)

43 Common Pitfalls and Remedies

Here we discuss some common pitfalls in ANN training and the correspond-ing remedies

43 Common Pitfalls and Remedies 31

431 Loss Function Stagnation

During training the loss function may display infinitesimal improvementsafter each epoch

To resolve this problem

bull Increase the batch size could potentially fix this issue as this helps theANN model to learn better the relationships between data samples anddata labels before updating the parameters

bull Increase the learning rate Too small of learning rate can cause theANN model to stuck in local minima

bull Use a deeper network According to the power of depth theoremdeeper network tends to perform better than shallower ones How-ever if the NN model already have 4 hidden layers then this methodsmay not be suitable as there is not much performance gain (see Chapter421)

bull Use a different activation function for the output layer Make sure therange of the output activation function is appropriate for the problem

432 Loss Function Fluctuation

During training the loss function may display infinitesimal improvementsafter each epoch Potential fix includes

bull Increase the batch size The ANN model might not be able to inter-pret the true relationships between input and output data before eachupdate when the batch size is too small

bull Lower the initialised learning rate

433 Over-fitting

Over-fitting occurs when the ANN model is able to perform well on trainingdata but fail to generalise on unseen test data

Potential fix includes

bull Reduce the complexity of ANN model Limit the number of hiddenlayers to 4 as (Eldan et al 2016) shows not much advantage can beachieved beyond 4 hidden layers Furthermore experiment with nar-rower (less neurons) hidden layer

bull Introduce early stopping Reduce the number of epochs if more epochshows only little improvements It is important to note that

bull Regularisation This can reduce the complexity of loss function andspeed up convergence

32 Chapter 4 Artificial Neural Networks

bull K-fold Cross-validation It works by splitting the training data intoK equal-size portions The K-1 portions are used for training and 1portion is used for validation This is repeated with different portionsfor validation K times

434 Under-fitting

Under-fitting occurs when the ANN model fails to make sense of the rela-tionships between data samples and labels

Potential fix includes

bull Increase size of training data Complex functional mappings requirethe ANN model the train on more data before it can learn the relation-ships well

bull Increase the depth of model It is very likely that the model is not deepenough for the problem As a rule of thumb limit the number of hiddenlayers to 3-4 for complex problems

44 Application in Finance ANN Image-based Im-plicit Method for Implied Volatility Approxi-mation

For ANNrsquos application in implied volatility predictions we apply the image-based implicit method proposed by (Horvath et al 2019) It is allegedly apowerful and robust method for approximating implied volatility Ratherthan using one implied volatility value as network output the image-basedimplicit method uses the entire implied volatility surface as the output Theimplied volatility surface is represented by 10times 10 pixels with each pixelcorresponds to one maturity and one strike price The input data correspond-ing to one implied volatility surface output is all the model parameters ex-cept for strike price and maturity they are implicitly being learned by thenetwork This method offers the following benefits

bull More information is learned by the network with less input data Theinformation of neighbouring volatility points is incorporated in the train-ing process

bull The network is learning the mapping of model parameters and the im-plied volatility surface directly Volatility smile and term structure areavailable simultaneously and so it is more efficient than having onlyone single implied volatility value as output

bull The neighbouring volatility points can be numerically approximatedalmost instantly if intended

44 Application in Finance ANN Image-based Implicit Method for ImpliedVolatility Approximation 33

O1

O2

O3

O98

O99

O100

NN Outputlayer

O1 O2 O3 O4 O5 O6 O7 O8 O9 O10

O11 O12 O13 O14 O15 O16 O17 O18 O19 O20

O21 O22 O23 O24 O25 O26 O27 O28 O29 O30

O31 O32 O33 O34 O35 O36 O37 O38 O39 O40

O41 O42 O43 O44 O45 O46 O47 O48 O49 O50

O51 O52 O53 O54 O55 O56 O57 O58 O59 O60

O61 O62 O63 O64 O65 O66 O67 O68 O69 O70

O71 O72 O73 O74 O75 O76 O77 O78 O79 O80

O81 O82 O83 O84 O85 O86 O87 O88 O89 O90

O91 O92 O93 O94 O95 O96 O97 O98 O99 O100

Implied Volatility Surface

FIGURE 42 The pixels of implied volatility surface are repre-sented by the outputs of neural network

35

Chapter 5

Data

In this chapter we discuss about the data for our applications We explainhow the data is generated and some essential data pre-processing proce-dures

Simulated data is chosen to train the ANN model mainly due to regula-tory purposes One might argue that training ANN with market data directlywould yield better predictions However the lack of evidence for the stabil-ity and robustness of this method is yet to be found Furthermore there is nostraightforward interpretation between inputs and outputs of ANN model ifit is trained using this method The main advantage of training ANN withsimulated data is that the functional relationship is known and so the modelis tractable This property is important as tractable models tend to be moretransparent and is required by regulations

For neural network computation Python is the logical language choiceas it has many powerful machine learning libraries such as Keras To obtaingood results we require 5 000 000 rows of data for the ANN experimentHence we choose C++ instead of Python for the data generation process as itis computationally faster We also use the lightweight header-only Pythonlibrary pybind11 that can harmoniously integrate C++ and Python in oneproject (see Appendix A for tutorial on pybind11)

51 Data Generation and Storage

The data generation process is done entirely in C++ mainly because of itsexecution speed Initially we consider applying GPU parallel computingto speed up the process even further After consulting with the supervisorwe decided to use the stdfuture class template instead The stdfuturebelongs to the standard C++11 library and it is very simple to implementcompared to parallel computing It allows the C++ program to run multi-ple functions asynchronously on different threads (see page 942 in (Duffy2018) for example) Our machine contains 2 physical cores with 2 threadsper physical core This allows us to run 4 operations asynchronously withoutoverhead costs and therefore in theory improves the data generation speedby four-fold Note that asynchronous programming is not meant to replaceparallel computing completely In cases where execution speed is absolutelycritical parallel computing should be used Without asynchronous program-ming the data generation process for 5000000 rows of Heston option price

36 Chapter 5 Data

data would take more than 8 hours Now it only takes less than 2 hours tocomplete the process

We follow the steps outline in Appendix A to wrap the C++ programusing the pybind11 Python library The wrapped C++ function can be calledin Python just like any other Python libraries The pybind11 library is verysimilar to the Boost library conceptually The reason we choose pybind11 isthat it is a lightweight header-only library that is easy to use This enables usto enjoy both the execution speed of C++ and the access to machine learninglibraries of Python at the same time

After the data is generated it is stored in MySQL so that we do not needto re-generate the data each time we restart the computer or when the PythonIDE crashes unexpectedly We use the mysql connector Python library to com-municate with MySQL on Python

511 Heston Call Option Data

We follow the method and theory described in Chapter 3 to compute thecall option price under Heston model We only consider call option in thisproject To train ANN that can also predict put option price one can simplyadd an extra input feature that only takes two values 1rsquo or -1 with 1represents call option and -1 represents put option

For this application we generated 5000000 rows of data that is uniformlydistributed over a specified range In this context one row of data containsall the input features and the corresponding output We use 10 model param-eters as the input features They include spot price (S) strike price (K) risk-free rate (r) dividend yield (D) currentinstantaneous volatility (ν) maturity(T) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)and correlation (ρ) In this case there will only be one output that is the calloption price See table 51 for their respective range

TABLE 51 Heston Model Call Option Data

Model Parameters Range

Input

Spot Price (S) isin [20 80]Strike Price (K) isin [40 55]Risk-free rate (r) isin [003 0045]Dividend Yield (D) isin [0 01]Instantaneous Volatility (ν) isin [02 06]Maturity (T) isin [003 20]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Call Option Price (C) isin Rgt0

51 Data Generation and Storage 37

512 Heston Implied Volatility Data

We generate the Heston implied volatility data with the theory explainedin Chapter 3 For ANN application in implied volatility approximation wedecided to use the image-based implicit method in which the ANN outputlayer dimension is 100 with each output represents one pixel on the impliedvolatility surface

Using this method we need 100 times more data in order to have thesame number of distinct rows of data as the normal point-wise method (egHeston Call Option Pricing) does However due to hardware constraintswe decided to generate 10000000 rows of uniformly distributed data Thattranslates to only 100000 distinct rows of data

We use 6 model parameters as the input features They include spotprice (S) currentinstantaneous volatility (ν) long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) and correlation (ρ)

Note that strike price and maturity are not being used as the input fea-tures as mentioned in Chapter 44 This is because the shape of one full im-plied volatility surface is dependent on the 6 model parameters mentionedabove The strike price and maturity correspond to one specific point onthe full implied volatility surface Nevertheless we still require these twoparameters to generate implied volatility data We use the following strikeprice and maturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

In this case there will be 10times 10 = 100 outputs that is the implied volatil-ity surface See table 52 for their respective range

TABLE 52 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]

Output Implied Volatility Surface (σimp) isin R10times10gt0

513 Rough Heston Call Option Data

Again we follow the theory in Chapter 3 and implement it in C++ to generatecall option price under rough Heston model

Similarly we only consider call option here and generate 5000000 rowsof data We use 9 model parameters as the input features They include strikeprice (K) risk-free rate (r) currentinstantaneous volatility (ν) maturity (T)long term volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ)

38 Chapter 5 Data

correlation (ρ) and Hurst parameter (H) In this case there will only be oneoutput that is the call option price See table 53 for their respective range

TABLE 53 Rough Heston Model Call Option Data

Model Parameters Range

Input

Strike Price (K) isin [05 5]Risk-free rate (r) isin [02 05]Instantaneous Volatility (ν) isin [0 10]Maturity (T) isin [01 30]Long Term Volatility (θ) isin [001 01]Mean Reversion Rate (κ) isin [10 40]Volatility of Volatility (ξ) isin [001 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Call Option Price (C) isin Rgt0

514 Rough Heston Implied Volatility Data

Finally We generate the rough Heston implied volatility data with the the-ory explained in Chapter 3 As before we also use the image-based implicitmethod

The data size we use is 10000000 rows of data and that translates to100000 distinct rows of data We use 7 model parameters as the input fea-tures They include spot price (S) currentinstantaneous volatility (ν) longterm volatility (θ) mean-reversion rate (κ) volatility of volatility (ξ) corre-lation (ρ) and Hurst parameter (H) We use the following strike price andmaturity values

bull strike price = 05 06 07 08 09 10 11 12 13 14

bull maturity = 02 04 06 08 10 12 14 16 18 20

As before there will be 10times 10 = 100 outputs that is the rough Hestonimplied volatility surface See table 54 for their respective range

TABLE 54 Heston Model Implied Volatility Data

Model Parameters Range

Input

Spot Price (S) isin [02 25]Instantaneous Volatility (ν) isin [02 06]Long Term Volatility (θ) isin [02 10]Mean Reversion Rate (κ) isin [01 10]Volatility of Volatility (ξ) isin [02 05]Correlation (ρ) isin [minus095 095]Hurst Parameter (H) isin [005 01]

Output Implied Volatility Surface (σimp) isin R10times10gt0

52 Data Pre-processing 39

52 Data Pre-processing

In this section we discuss about some necessary data pre-processing pro-cedures We talk about data splitting for validation and evaluation (test-ing) data standardisation and normalisation and data partitioning for K-fold cross validation

521 Data Splitting Train - Validate - Test

Generally not 100 of available data is used for training We would reservesome of the data for validation and for performance evaluation

We split the data into three sets 70 (train) 15 (validate) and 15 (test)The first 70 of data is used for training whilst 15 of the data is used forvalidation during training The last 15 is for evaluating the trained ANNrsquospredictive accuracy

The validation data set is the data set used for tuning the hyperparame-ters of the ANN This is important to help prevent over-fitting of the networkThe test data set is not used during training It is the unseen (out-of-sample)data that is used for evaluating the trained ANNrsquos predictions

522 Data Standardisation and Normalisation

Before the data is fed into the ANN for training it is absolutely essential toscale the data to speed up convergence and improve performance

The magnitude of the each data feature is different It is essential to re-scale the data so that the ANN model will not misinterpret their relative im-portance based on their magnitude Typical scaling techniques include stan-dardisation and normalisation

Data standardisation is defined as

xscaled =xminus xmean

xstd(51)

where

bull x is data

bull xmean is the mean of that data feature

bull xstd is the standard deviation of that data feature

(Horvath et al 2019) claims that this works especially well for quantity thatis positively unbounded such as implied volatility

For other model parameters we apply data normalisation which is de-fined as

xscaled =2xminus (xmax + xmin)

xmax + xminisin [minus1 1] (52)

where

bull x is data

40 Chapter 5 Data

bull xmax is the maximum value of that data feature

bull xmin is the minimum value of that data feature

After data scaling (standardisation or normalisation) ANN is able to learnthe true relationships between inputs and outputs without the bias from thedifference in their magnitudes

523 Data Partitioning

In this section we explain how the data is partitioned for K-fold cross vali-dation

First we shuffle the entire training data set Then the entire data set isdivided into K equal sized portions For each portion we use it as the testdata set and we train the ANN on the other Kminus 1 portions This is repeatedK times with each of the K portions used exactly once as the test data setThe average of K results is the performance measure of the model

FIGURE 51 shows how the data is partitioned for 5-fold crossvalidation Adapted from (Chapter 2 Modeling Process k-fold

cross validation)

41

Chapter 6

Neural Networks Architecture andExperimental Setup

We start this chapter by discussing the appropriate ANN architecture andexperimental setup for our applications We apply ANN to four differentapplications option pricing under Heston model Heston implied volatilitysurface approximation option pricing under rough Heston model and roughHeston implied volatility surface approximation We first apply ANN in He-ston model as a concept validation and then proceed to rough Heston model

Subsequently we discuss about the relevant evaluation metrics and vali-dation methods for our ANN models Finally we end this chapter with theimplementation steps for ANN in Keras

61 Neural Networks Architecture

Configuring ANNrsquos architecture is a combination of art and science It re-quires experience and systematic experiment to decide the right parametersand architecture for a specific problem For a particular problem there mightbe multiple designs that are appropriate To begin with we would use thenetwork architecture used by (Horvath et al 2019) and made adjustment tothe hyperparameters if it performs poorly for our applications

42 Chapter 6 Neural Networks Architecture and Experimental Setup

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 61 Typical ANN Architecture for Option Pricing Ap-plication

61 Neural Networks Architecture 43

I1

I10

H1

H2

H29

H30

H1

H2

H29

H30

H1

H2

H29

H30

O1

O2

O3

O98

O99

O100

Inputlayer

Ouputlayer

Hiddenlayer 1

Hiddenlayer 2

Hiddenlayer 3

FIGURE 62 Typical ANN Architecture for Implied VolatilitySurface Approximation

611 ANN Architecture Option Pricing under Heston Model

ANN with 3 hidden layers is approapriate for this application The numberof neurons will be 50 with exponential linear unit (ELU) as the activationfunction for each hidden layer

Initially the rectified linear activation function (ReLU) was considered Itis the choice for many applications as it can overcome the vanishing gradi-ent problem whilst being less computationally expensive compared to otheractivation functions However it has some major drawbacks

bull The nth-derivative of ReLU is not continuous for all n gt 0

bull The ReLU cannot produce negative outputs

bull For activation x lt 0 ReLU has zero gradient

From Theorem 422 we know choosing a n-differentiable activation func-tion is necessary for computing the nth-derivatives of the functional map-ping This is crucial for our application because we would be interested in

44 Chapter 6 Neural Networks Architecture and Experimental Setup

option Greeks computation using the same ANN if it shows promising re-sults for option pricing For risk management purposes selecting an activa-tion function that option Greeks can be calculated is crucial

The obvious alternative would be the ELU (defined in Chapter 414) Forx lt 0 it has a smooth output until it approaches tominusα When x gt 0 the out-put of ELU is positively unbounded which is suitable for our application asthe values of option price and implied volatility are also in theory positivelyunbounded

Since our problem is a regression problem linear activation function (de-fined in Chapter 415) would be appropriate for the output layer as its outputis unbounded We discover that batch size of 200 and number of epochs of30 would yield good accuracy

To summarise the ANN model for Heston Option Pricing is

bull Input Layer 10 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

612 ANN Architecture Implied Volatility Surface of Hes-ton Model

For approximating the full implied volatility surface of Heston model weapply the image-based implicit method (see Chapter 44) This implies theoutput dimension is 100

We use 3 hidden layers with 80 neurons The activation function for hid-den layers is exponential linear unit (ELU) As before we use linear activationfunction for the output layer We discover that batch size of 20 and 200 epochswould yield good accuracy Since the number of epochs is relatively high weintroduce an early stopping feature to stop the training when validation lossstops improving for 25 epochs

To summarise the ANN model for Heston implied volatility surfaceapproximation is

bull Input Layer 6 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Hidden Layer 4 80 neurons ELU as activation function

bull Output Layer 100 neurons linear function as activation function

62 Performance Metrics and Validation Methods 45

613 ANN Architecture Option Pricing under Rough Hes-ton Model

For option pricing under rough Heston model the ANN model for roughHeston Option Pricing is

bull Input Layer 9 neurons

bull Hidden Layer 1 50 neurons ELU as activation function

bull Hidden Layer 2 50 neurons ELU as activation function

bull Hidden Layer 3 50 neurons ELU as activation function

bull Output Layer 1 neuron linear function as activation function

As before we use batch size of 200 and 30 epochs

614 ANN Architecture Implied Volatility Surface of RoughHeston Model

The ANN model for rough Heston implied volatility surface approxima-tion is

bull Input Layer 7 neurons

bull Hidden Layer 1 80 neurons ELU as activation function

bull Hidden Layer 2 80 neurons ELU as activation function

bull Hidden Layer 3 80 neurons ELU as activation function

bull Output Layer 100 neuron linear function as activation function

We use batch size of 20 and number of epochs of 200 with early stopping fea-ture to stop the training when validation loss stops improving for 25 epochs

62 Performance Metrics and Validation Methods

We require some metrics to evaluate the performance of the ANN in terms ofaccuracy and speed

For accuracy we have introduced the MSE and MAE in Chapter 414 Itis common to use the coefficient of determination (R2) to quantify the per-formance of a regression model It is a statistical measure that quantify theproportion of the variances for dependent variables that is explained by in-dependent variables The formula is

R2 = 1minussumn

i=1(yiactual minus yipredict)2

sumni=1(yiactual minus yactual)2 (61)

46 Chapter 6 Neural Networks Architecture and Experimental Setup

We also suggest a user-defined accuracy metrics Maximum Absolute Er-ror It is defined as

Max Abs Error = max(|yiactual minus yipredict|) (62)

This metric provides a measure of how large the error could be in the worstcase scenario It is especially important from a risk management perspective

As for the run-time performance of ANN We simply use the Python built-in timer function to quantify this property

In Chapter 523 we explained about the data partitioning process Themotivation for partitioning data is to do K-fold cross validation Cross vali-dation is a statistical technique to assess how well the predictive model canbe generalised to an independent data set It is a good tool for identifyingover-fitting

As explained earlier the process involves training and testing differentpermutation of the partitioned data sets multiple times The indication of awell generalised model is that the error magnitudes in each round are consis-tent with each other and their average We select K = 5 for our applications

63 Implementation of ANN in Keras 47

63 Implementation of ANN in Keras

In this section we outline the standard ANN computation process using Keras

1 Construct the ANN model by specifying the number of input neuronsnumber of hidden neurons number of hidden layers number of out-put neurons and the activation functions for each layer Print modelsummary to check for errors

2 Configure the settings of the model by using the compile functionWe select the MSE as the loss function and we specify MAE R2 andour user-defined maximum absolute error as the metrics For all of ourapplications we choose ADAM as the optimisation algorithm To pre-vent over-fitting we use the early stopping function especially whenthe number of epochs is high It is important to note that the validationloss (not training loss) should be used as the early stopping criteriaThis means the training will stop early if there is no reduction in vali-dation loss

3 Starts training the ANN model It is important to specify the validationsplit argument here for splitting the training data into a train set andvalidation set Note that the test set should not be used as validation setit should be reserved for evaluating ANNrsquos predictions after training

4 When the training is complete generate the plot of MSE (loss function)versus number of epochs to visualise the training process MSE shoulddecline gradually and then level off as number of epochs increases SeeChapter 43 for potential training issues and remedies

5 Apply 5-fold cross validation to check for over-fitting Evaluate thetrained model using evaluate function

6 Generate predictions Visualise the predictions by plotting predictedvalues versus actual values on the same graph

See Appendix C for Python code example

48C

hapter6

NeuralN

etworks

Architecture

andExperim

entalSetup

64 Summary of Neural Networks Architecture

TABLE 61 Summary of Neural Networks Architecture for Each Application

ANN Models Applications InputLayer

Hidden Layers OutputLayer

BatchSize

Epoch

Heston Option PricingANN

Heston Call Option Pricing 10 neu-rons

3 layers times 50 neu-rons

1 neuron 200 30

Heston Implied VolatilityANN

Heston Implied Volatility Sur-face

6 neurons 3 layers times 80 neu-rons

100 neurons 20 200

Rough Heston OptionPricing ANN

Rough Heston Call Option Pric-ing

9 neurons 3 layers times 50 neu-rons

1 neuron 200 30

Rough Heston ImpliedVolatility ANN

Rough Heston Implied Volatil-ity Surface

7 neurons 3 layers times 80 neu-rons

100 neurons 20 200

49

Chapter 7

Results and Discussions

In this chapter we present our results and discuss our findings

71 Heston Option Pricing ANN Results

Figure 71 shows the loss function plot of Heston Option Pricing ANN learn-ing curves The loss function used here is the MSE As the ANN is trainedboth the curves of training loss function and validation loss function declineuntil they converge to a point It took less than 30 epochs for the ANN toreach a satisfactory level of accuracy There is a small gap between the train-ing loss function and validation loss function This indicates the ANN modelis well trained

FIGURE 71 Plot of MSE Loss Function vs No of Epochs forHeston Option Pricing ANN

Table 71 summarises the error metrics of Heston Option Pricing ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Errorand R2 are 200times 10minus6 958times 10minus4 650times 10minus3 and 0999 respectively Thelow values in error metrics and high R2 value imply the ANN model gen-eralises well and is able to give high accuracy predictions for option price

50 Chapter 7 Results and Discussions

under Heston model Figure 72 attests its high accuracy Almost all of thepredictions lie on the perfect predictions line (red line)

TABLE 71 Error Metrics of Heston Option Pricing ANN Pre-dictions on Unseen Data

MSE MAE Max Abs Error R2

200times 10minus6 958times 10minus4 650times 10minus3 0999

FIGURE 72 Plot of Predicted vs Actual values for Heston Op-tion Pricing ANN

TABLE 72 5-fold Cross Validation Results of Heston OptionPricing ANN

MSE MAE Max Abs ErrorFold 1 578times 10minus7 545times 10minus4 204times 10minus3

Fold 2 990times 10minus7 769times 10minus4 240times 10minus3

Fold 3 715times 10minus7 621times 10minus4 220times 10minus3

Fold 4 124times 10minus6 867times 10minus4 253times 10minus3

Fold 5 654times 10minus7 579times 10minus4 221times 10minus3

Average 836times 10minus7 676times 10minus4 227times 10minus3

Table 72 shows the results from 5-fold cross validation which again con-firms the ANN model can give accurate predictions and generalises well tounseen data The values of each fold are consistent with each other and withtheir average values

72 Heston Implied Volatility ANN Results 51

72 Heston Implied Volatility ANN Results

Figure 73 shows the loss function plot of Heston Implied Volatility ANNlearning curves Again we use the MSE as the loss function here We can seethat the validation loss curve experiences some fluctuations during trainingWe experimented with various ANN setups and this is the least fluctuatinglearning curve we can achieve Despite the fluctuations the overall valueof validation loss function declines to a satisfactory low level and the gapbetween validation loss and training loss is negligible Due to the compara-tively more complex ANN architecture it took about 200 epochs for the ANNto reach a satisfactory level of accuracy

FIGURE 73 Plot of MSE Loss Function vs No of Epochs forHeston Implied Volatility ANN

Table 73 summarises the error metrics of Heston Implied Volatility ANNmodelrsquos predictions on unseen test data The MSE MAE Max Abs Error andR2 are 624times 10minus4 127times 10minus2 371times 10minus1 and 0999 respectively The lowvalues in error metrics and high R2 value imply the ANN model generaliseswell and is able to give accurate predictions for implied volatility surfaceunder Heston model Figure 74 shows the ANN predicted volatility smilesand the actual smiles for 10 different maturities We can see that almost all ofthe predicted values are consistent with the actual values

TABLE 73 Accuracy Metrics of Heston Implied Volatility ANNPredictions on Unseen Data

MSE MAE Max Abs Error R2

624times 10minus4 127times 10minus2 371times 10minus1 0999

52C

hapter7

Results

andD

iscussions

FIGURE 74 Heston Implied Volatility Smiles ANN Predictions vs Actual Values

73 Rough Heston Option Pricing ANN Results 53

FIGURE 75 Mean Absolute Error of Full Heston ImpliedVolatility Surface ANN Predictions on Unseen Data

Figure 75 shows the MAE values of Heston implied volatility ANN pre-dictions across the entire implied volatility surface The largest MAE valueis 00045 A large area of the surface has MAE values lower than 00030 It isworth noting that the accuracy is relatively poor when the strike price is atthe extremes and maturity is small However the ANN gives good predic-tions overall for the whole surface

The 5-fold cross validation results for Heston implied volatility ANNmodel shows consistent values The results is summarised in Table 74 Theconsistent values indicate the ANN model is well-trained and is able to gen-eralise to unseen test data

TABLE 74 5-fold Cross Validation Results of Heston ImpliedVolatility ANN

MSE MAE Max Abs ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

73 Rough Heston Option Pricing ANN Results

Figure 76 shows the loss function plot of Rough Heston Option Pricing ANNlearning curves As before MSE is used as the loss function As the ANNprogresses both the curves of training loss function and validation loss func-tion decline until they converge to a point Since this is a relatively simple

54 Chapter 7 Results and Discussions

model it took less than 30 epochs of training to reach a good level of accuracyThe small discrepancy between the training loss and validation indicates theANN model is well-trained

FIGURE 76 Plot of MSE Loss Function vs No of Epochs forRough Heston Option Pricing ANN

Table 75 summarises the error metrics of Rough Heston Option PricingANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 900times 10minus6 228times 10minus3 176times 10minus1 and 0999 respectivelyThese metrics imply the ANN model generalises well and is able to give highaccuracy predictions for option prices under rough Heston model Figure 77attests its high accuracy again We can see that most of the predictions lie onthe perfect predictions line (red line)

TABLE 75 Accuracy Metrics of Rough Heston Option PricingANN Predictions on Unseen Data

MSE MAE Max Abs Error R2

900times 10minus6 228times 10minus3 176times 10minus1 0999

74 Rough Heston Implied Volatility ANN Results 55

FIGURE 77 Plot of Predicted vs Actual values for Rough Hes-ton Pricing ANN

TABLE 76 5-fold Cross Validation Results of Rough HestonOption Pricing ANN

MSE MAE Max Abs ErrorFold 1 379times 10minus6 135times 10minus3 599times 10minus3

Fold 2 233times 10minus6 996times 10minus4 489times 10minus3

Fold 3 206times 10minus6 988times 10minus3 441times 10minus3

Fold 4 216times 10minus6 108times 10minus3 407times 10minus3

Fold 5 184times 10minus6 101times 10minus3 374times 10minus3

Average 244times 10minus6 462times 10minus3 108times 10minus3

The highly consistent values from 5-fold cross validation again confirmthe ANN model is able to generalise to unseen data

74 Rough Heston Implied Volatility ANN Results

Figure 78 shows the loss function plot of Rough Heston Implied VolatilityANN learning curves We can see that the validation loss curve experiencessome fluctuations during the training Again we select the least fluctuatingsetup Despite the fluctuations the overall value of validation loss functiondeclines to a satisfactory low level and the discrepancy between validationloss and training loss is negligible Initially the training epoch is 200 but itwas stopped early at around 85 since the validation loss value has not im-proved for 25 epochs It achieves a good accuracy nonetheless

56 Chapter 7 Results and Discussions

FIGURE 78 Plot of MSE Loss Function vs No of Epochs forRough Heston Implied Volatility ANN

Table 77 summarises the error metrics of Rough Heston Implied VolatilityANN modelrsquos predictions on unseen test data The MSE MAE Max AbsError and R2 are 566times 10minus4 108times 10minus2 349times 10minus1 and 0999 respectivelyThese metrics show the ANN model can produce accurate predictions onunseen data This is again confirmed by Figure 79 which shows the ANNpredicted volatility smiles and the actual smiles for 10 different maturitiesWe can see that almost all of the predicted values are consistent with theactual values

TABLE 77 Accuracy Metrics of Rough Heston Implied Volatil-ity ANN Predictions on Unseen Data

MSE MAE Max Error R2

566times 10minus4 108times 10minus2 349times 10minus1 0999

74R

oughH

estonIm

pliedVolatility

AN

NR

esults57

FIGURE 79 Rough Heston Implied Volatility Smiles ANN Predictions vs Actual Values

58 Chapter 7 Results and Discussions

FIGURE 710 Mean Absolute Error of Full Rough Heston Im-plied Volatility Surface ANN Predictions on Unseen Data

Figure 710 shows the MAE values of Rough Heston Implied VolatilityANN predictions across the entire implied volatility surface The largestMAE value is below 00030 A large area of the surface has MAE values lowerthan 00015 It is worth noting that the accuracy is relatively poor when thestrike price and maturity values are small However the ANN gives goodpredictions overall for the whole surface In fact it is more accurate than theHeston implied volatility ANN model despite having one extra input andsimilar architecture We think this is entirely due to the stochastic nature ofthe optimiser

The 5-fold cross validation results for rough Heston implied volatilityANN model shows consistent values just like the previous three cases Theresults is summarised in Table 78 The consistency in values indicate theANN model is well-trained and is able to generalise to unseen test data

TABLE 78 5-fold Cross Validation Results of Rough HestonImplied Volatility ANN

MSE MAE Max ErrorFold 1 815times 10minus4 113times 10minus2 388times 10minus1

Fold 2 474times 10minus4 108times 10minus2 313times 10minus1

Fold 3 137times 10minus3 174times 10minus2 446times 10minus1

Fold 4 338times 10minus4 861times 10minus3 219times 10minus1

Fold 5 460times 10minus4 102times 10minus2 267times 10minus1

Average 692times 10minus4 112times 10minus2 327times 10minus1

75 Run-time Performance 59

75 Run-time Performance

In this section we present the training time and the execution speed of ANNversus traditional methods The speed tests are run on a machine with Inteli5-7200U chip (up to 310 GHz) and 8GB of RAM

Table 79 summarises the training time required for each application Wecan see that although the training time per epoch for option pricing applica-tions is higher than that for implied volatility applications the resulting totaltraining time is similar for both types of applications This is because numberof epochs for training implied volatility ANN models is higher

We also emphasise that the number of distinct rows of data available forimplied volatility ANN training is less Thus the training time is similar tothat of option pricing ANN training despite the more complex ANN archi-tecture

TABLE 79 Training Time

Applications Training Time per Epoch Total Training TimeHeston Option Pricing 30 s 900 sHeston Implied Volatility 5 s 1000 srHeston Option Pricing 30 s 900 srHeston Implied Volatility 5 s 1000 s

TABLE 710 Execution Time for Predictions using ANN andTraditional Methods

Applications Traditional Methods ANNHeston Option Pricing 884 ms 579 msHeston Implied Volatility 231 ms 437 msrHeston Option Pricing 652 ms 526 msrHeston Implied Volatility 287 ms 483 ms

Table 710 summarises the execution time for generating predictions usingANN and traditional methods Before the experiment we expect the ANN togenerate predictions faster However results shows that ANN is actually upto 7 times slower than the traditional methods for option pricing (both Hes-ton and rough Heston) up to 18 times slower than the traditional methodsfor implied volatility surface approximation (both Heston and rough Hes-ton) This shows the traditional approximation methods are more efficient

61

Chapter 8

Conclusions and Outlook

To summarise results shows that ANN has the ability to approximate op-tion price and implied volatility under Heston and rough Heston models toa high degree of accuracy However the efficiency of ANN is not as goodas the traditional methods Hence we conclude that such an approach maytherefore be considered an ineffective alternative There are more efficientmethods such as the numerical integration of FFT for option price computa-tion The traditional method we used for approximating implied volatilitydoes not even require any numerical schemes and hence its efficiency

As for the future projects it would be interesting to investigate the provenrough Bergomi model applications with exotic options such as lookback anddigital options Moreover it would also be worthwhile to apply gradient-free optimisers for ANN training Some of the interesting examples includedifferential evolution dual annealing and simplicial homology global opti-misers

63

Appendix A

pybind11 A step-by-step tutorialwith examples

This tutorial assumes basic knowledge of C++ and Python

Sources Create a C++ extension for Python Using the C++ eigen library to calcu-late matrix inverse and determinant pybind11 Documentation Eigen The MatrixClass

A1 Software Prerequisites

bull Visual Studio 2017 or later with both the Desktop Development withC++ and Python Development workloads installed with default op-tions

bull In the Python Development workload also select the box on the rightfor Python native development tools This option sets up most of theconfiguration required (This option also includes the C++ workloadautomatically)

Source Create a C++ extension for Python

64 Appendix A pybind11 A step-by-step tutorial with examples

A2 Create a Python Project

1 To create a Python project in Visual Studio select File gt New gt ProjectSearch for Python select the Python Application template give it asuitable name and location and select OK

2 pybind11 requires that you use a 32-bit Python interpreter (Python 36or above recommended) In the Solution Explorer window of VisualStudio expand the project node then expand the Python Environ-ments node If you donrsquot see a 32-bit environment as the default (eitherin bold or labeled with global default) then right-click the PythonEnvironments node and select Add Environment or select Add Envi-ronment from the environment drop-down in the Python toolbar

Once in the Add Environment dialog box select the Existing environ-ment tab then select the 32-bit interpreter from the Environment dropdown list

If you donrsquot have a 32-bit interpreter installed you can install standardpython interpreters from the Add Environment dialog Select the AddEnvironment command in the Python Environments window or thePython toolbar select the Python installation tab indicate which inter-preters to install and select Install

A2 Create a Python Project 65

If you already added an environment other than the global defaultto a project you may need to activate a newly added environmentRight-click that environment under the Python Environments nodeand select Activate Environment To remove an environment from theproject select Remove

66 Appendix A pybind11 A step-by-step tutorial with examples

A3 Create a C++ Project

1 A Visual Studio solution can contain both Python and C++ projects to-gether To do so right-click the solution in Solution Explorer and selectAdd gt New Project

2 Search on C++ select Empty project specify the name test and se-lect OK

3 Create a C++ file in the new project by right-clicking the Source Filesnode then select Add gt New Item select C++ File name it maincppand select OK

4 Right-click the C++ project in Solution Explorer select Properties

5 At the top of the Property Pages dialog that appears set Configurationto All Configurations and Platform to Win32

A4 Convert the C++ project to extension for Python 67

6 Set the specific properties as described in the following table then selectOK

7 Right-click the C++ project and select Build to test your configurations(both Debug and Release) The pyd files are located in the solutionfolder under Debug and Release not the C++ project folder itself

8 Add the following code to the C++ projectrsquos maincpp file

1 include ltiostream gt2

3 int main()4 stdcout ltlt Hello World from C++ ltlt std

endl5

6 return 07

9 Build the C++ project again to confirm the code is working

A4 Convert the C++ project to extension for Python

To make the C++ DLL into an extension for Python first modify the exportedmethods to interact with Python types Then add a function that exports themodule (main function)

1 Install pybind11 using pip

1 pip install pybind11

or

1 py -m pip install pybind11

2 At the top of maincpp include pybind11h

1 include ltpybind11pybind11hgt

68 Appendix A pybind11 A step-by-step tutorial with examples

3 At the bottom of maincpp use the PYBIND11_MODULE macro to de-fine the entry point to the C++ function

1 namespace py = pybind112

3 PYBIND11_MODULE(test m) 4 mdef(main_func ampmain Rpbdoc(5 pybind11 simple example6 )pbdoc)7

8 ifdef VERSION_INFO9 mattr(__version__) = VERSION_INFO

10 else11 mattr(__version__) = dev12 endif13

4 Set the target configuration to Release and build the C++ project toverify your code

The C++ module may fail to compile for the following reasons

bull Unable to locate Pythonh (E1696 cannot open source file Pythonhandor C1083 Cannot open include file Pythonh No suchfile or directory) verify that the path in CC++ gt General gt Addi-tional Include Directories in the project properties points to yourPython installationrsquos include folder See the table in Step 6

bull Unable to locate Python libraries verify that the path in Linker gtGeneral gt Additional Library Directories in the project propertiespoints to the Python installationrsquos libs folderSee the table in Step6

bull Linker errors related to target architecture change the C++ tar-getrsquos project architecture to match that of your Python installationFor example if yoursquore targeting x64 with the C++ project but yourPython installation is x86 change the C++ project to target x86

A5 Make the DLL available to Python

1 In Solution Explorer right-click the References node in the Pythonproject and then select Add Reference In the dialog that appears se-lect the Projects tab select the test project and then select OK

A5 Make the DLL available to Python 69

2 Run the Visual Studio installer select Modify select Individual Com-ponents gt Compilers build tools and runtimes gt Visual C++ 20153v140 toolset This step is necessary because Python (for Windows) isitself built with Visual Studio 2015 (version 140) and expects that thosetools are available when building an extension through the method de-scribed here (Note that you may need to install a 32-bit version ofPython and target the DLL to Win32 and not x64)

70 Appendix A pybind11 A step-by-step tutorial with examples

3 Create a file named setuppy in the C++ project by right-clicking theproject and selecting Add gt New Item Then select C++ File (cpp)name the file setuppy and select OK (naming the file with the py ex-tension makes Visual Studio recognize it as Python despite using theC++ file template)

Then paste the following code into the setuppy file

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs =[rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalProgramsPython

Python37 -32 include rsquo]14 language=rsquoc++rsquo15 extra_compile_args = cpp_args 16 )17

18 setup(19 name = rsquopybind11_test rsquo20 version = rsquo10rsquo21 description = rsquoSimple pybind11 example rsquo22 license = rsquoMITrsquo23 ext_modules = [sfc_module]24 )

4 The setuppy code instructs Python to build the extension using theVisual Studio 2015 C++ toolset when used from the command line

Open an elevated command prompt navigate to the folder that con-tains setuppy and enter the following command

1 python setuppy install

A6 Call the DLL from Python

After DLL is made available to Python we can now call the testmain_func()from Python code

1 Add the following lines in your py file to call methods exported fromthe DLL

A7 Example Store C++ Eigen Matrix as NumPy Arrays 71

1 import test2

3 if __name__ == __main__4 testmain_func ()

2 Run the Python program (Debug gt Start without Debugging) Theoutput should look like the following

A7 Example Store C++ Eigen Matrix as NumPyArrays

This example requires the use of Eigen a C++ template library Due to itspopularity pybind11 provides transparent conversion and limited mappingsupport between Eigen and Scientific Python linear algebra data types

1 If have not already visit httpeigentuxfamilyorgindexphptitle=Main_Page Click on zip to download the zip file

Once the download is complete decompress the zip file in a suitable lo-cation Note down the file path containing the folders as shown below

72 Appendix A pybind11 A step-by-step tutorial with examples

2 In the Solution Explorer right-click the C++ project select PropertiesNavigate to the CC++ gt General tab add the Eigen path to the Addi-tional Library Directories Property then select OK

3 Right-click the C++ project and select Build to test your configurations(both Debug and Release)

4 Replace the code in maincpp file with the following code

1 include ltiostream gt2

3 pybind114 include ltpybind11pybind11hgt5 include ltpybind11eigenhgt

A7 Example Store C++ Eigen Matrix as NumPy Arrays 73

6

7

8 Eigen9 include ltEigenDense gt

10

11 namespace py = pybind1112

13 Eigen Matrix3f example_mat(const int a14 const int b const int c) 15 Eigen Matrix3f mat16 mat ltlt 5 a 1 b 2 c17 2 a 4 b 2 c18 1 a 3 b 3 c19

20 return mat21 22

23 Eigen MatrixXd inv(Eigen MatrixXd xs) 24 return xsinverse ()25 26

27 double det(Eigen MatrixXd xs) 28 return xsdeterminant ()29 30

31 PYBIND11_MODULE(test m) 32 mdef(example_mat ampexample_mat)33 mdef(inv ampinv)34 mdef(det ampdet)35

36 ifdef VERSION_INFO37 mattr(__version__) = VERSION_INFO38 else39 mattr(__version__) = dev40 endif41 42

Note that Eigen arrays are automatically converted to numpy arrayssimply by including the pybind11eigenh header (see line 5 in 4) C++function that has an Eigen return type can be directly use as a NumPydata type in PythonBuild the C++ project to check the code working correctly

5 Include Eigen libraryrsquos directory Replace the code in setuppy with thefollowing code

1 import os sys2 from distutilscore import setup Extension

74 Appendix A pybind11 A step-by-step tutorial with examples

3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo6 rsquo-mmacosx -version -min =107rsquo]7

8 sfc_module = Extension(9 rsquopybind11_test rsquo

10 sources = [rsquomaincpprsquo]11 add pybind11 folder and Python include

folder12 include_dirs = [rrsquopybind11include rsquo13 rrsquoC UsersOwnerAppDataLocalPrograms

PythonPython37 -32 include rsquo14 rrsquoC UsersOwnersourcereposeigen -337

eigen -337 rsquo]15 language = rsquoc++rsquo16 extra_compile_args = cpp_args 17 )18

19 setup(20 name = rsquopybind11_test rsquo21 version = rsquo10rsquo22 description = rsquoSimple pybind11 example rsquo23 license = rsquoMITrsquo24 ext_modules = [sfc_module]25 )

As before in an elevated command prompt navigate to the folder thatcontains setuppy and enter the following command

1 python setuppy install

6 Then paste the following code in the Python file

1 import test2 import numpy as np3

4 if __name__ == __main__5 A = testexample_mat (131)6 print(Matrix A is n A)7 print(The determinant of A is n test

inv(A))8 print(The inverse of A is testdet(A)

)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 75

The output should be

A8 Example Store Data Generated by AnalyticalHeston Model as NumPy Arrays

Only relevant part of the code will be displayed here To request the code forthe entire project contact the author at cko22bathedu

This example demonstrates how to use pybind11 for

bull C++ project with multiple files

bull Store C++ Eigen Matrix as NumPy arrays

1 For multiple files C++ project we need to include the source files insetuppy This example also uses the Eigen library and so its directorywill also be included Paste the following code in setuppy

1 import os sys2 from distutilscore import setup Extension3 from distutils import sysconfig4

5 cpp_args = [rsquo-std=c++11rsquo rsquo-stdlib=libc++rsquo rsquo-mmacosx -version -min =107rsquo]

6

7 sfc_module = Extension(8 rsquoData_Generation rsquo

76 Appendix A pybind11 A step-by-step tutorial with examples

9 sources = [rrsquomodulecpprsquo rrsquoDatacpprsquo rrsquoHestoncpprsquo rrsquoIntegrationSchemecpprsquo rrsquoIntegratorImpcpprsquo rrsquoNumIntegratorcpprsquo rrsquopdetfunctioncpprsquo rrsquopfunctioncpprsquo rrsquoRangecpprsquo]

10 include_dirs =[rrsquopybind11include rsquo rrsquoCUsersOwnerAppDataLocalProgramsPythonPython37 -32 include rsquo rrsquoCUsersOwnersourcereposeigen -337 eigen -337 rsquo]

11 language=rsquoc++rsquo12 extra_compile_args = cpp_args 13 )14

15 setup(16 name = rsquoData_Generation rsquo17 version = rsquo10rsquo18 description = rsquoPython package with

Data_Generation C++ extension (PyBind11)rsquo19 license = rsquoMITrsquo20 ext_modules = [sfc_module]21 )

Then go to the folder that contains setuppy in command prompt enterthe following command

1 python setuppy install

2 The code in modulecpp is

1 For analytic solution in case of Hestonmodel

2 include Hestonhpp3 include ltctime gt4 include Datahpp5 include ltfuture gt6

7

8 Inputting and Outputting9 include ltiostream gt

10 include ltvector gt11

12 pybind1113 include ltpybind11pybind11hgt14 include ltpybind11eigenhgt15 include ltpybind11stl_bindhgt16

17 Eigen18 include ltCUsersOwnersourcereposeigen

-337 eigen -337 EigenDense gt

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 77

19 include ltEigenDense gt20

21 namespace py = pybind1122

23 struct CPPData 24 Input Data25 stdvector ltdouble gt spot_price26 stdvector ltdouble gt strike_price27 stdvector ltdouble gt risk_free_rate28 stdvector ltdouble gt dividend_yield29 stdvector ltdouble gt initial_vol30 stdvector ltdouble gt maturity_time31 stdvector ltdouble gt long_term_vol32 stdvector ltdouble gt mean_reversion_rate33 stdvector ltdouble gt vol_vol34 stdvector ltdouble gt price_vol_corr35

36 Output Data37 stdvector ltdouble gt option_price38 39

40

41 42 do something 43 44

45 Eigen MatrixXd module () 46 CPPData cppdata47 simulation(cppdata)48

49 Convert vector to eigen vector first50 Input Data51 Map ltEigen VectorXd gt eigen_spot_price(

cppdataspot_pricedata() cppdataspot_pricesize())

52 Map ltEigen VectorXd gt eigen_strike_price(cppdatastrike_pricedata() cppdatastrike_pricesize())

53 Map ltEigen VectorXd gt eigen_risk_free_rate(cppdatarisk_free_ratedata() cppdatarisk_free_ratesize())

54 Map ltEigen VectorXd gt eigen_dividend_yield(cppdatadividend_yielddata() cppdatadividend_yieldsize())

55 Map ltEigen VectorXd gt eigen_initial_vol(cppdatainitial_voldata() cppdatainitial_volsize())

78 Appendix A pybind11 A step-by-step tutorial with examples

56 Map ltEigen VectorXd gt eigen_maturity_time(cppdatamaturity_timedata() cppdatamaturity_timesize())

57 Map ltEigen VectorXd gt eigen_long_term_vol(cppdatalong_term_voldata() cppdatalong_term_volsize())

58 Map ltEigen VectorXd gteigen_mean_reversion_rate(cppdatamean_reversion_ratedata() cppdatamean_reversion_ratesize())

59 Map ltEigen VectorXd gt eigen_vol_vol(cppdatavol_voldata() cppdatavol_volsize())

60 Map ltEigen VectorXd gt eigen_price_vol_corr(cppdataprice_vol_corrdata() cppdataprice_vol_corrsize())

61 Output Data62 Map ltEigen VectorXd gt eigen_option_price(

cppdataoption_pricedata() cppdataoption_pricesize())

63

64 const int cols 11 No of columns65 const int rows = SIZE 466 Define a matrix that store training

data to be used in Python67 MatrixXd training(rows cols)68 Initialise each column using vectors69 trainingcol(0) = eigen_spot_price70 trainingcol(1) = eigen_strike_price71 trainingcol(2) = eigen_risk_free_rate72 trainingcol(3) = eigen_dividend_yield73 trainingcol(4) = eigen_initial_vol74 trainingcol(5) = eigen_maturity_time75 trainingcol(6) = eigen_long_term_vol76 trainingcol(7) =

eigen_mean_reversion_rate77 trainingcol(8) = eigen_vol_vol78 trainingcol(9) = eigen_price_vol_corr79 trainingcol (10) = eigen_option_price80

81 stdcout ltlt training ltlt stdendl82

83 return training84 85

86 PYBIND11_MODULE(Data_Generation m) 87

88 mdef(module ampmodule)

A8 Example Store Data Generated by Analytical Heston Model asNumPy Arrays 79

89

90 ifdef VERSION_INFO91 mattr(__version__) = VERSION_INFO92 else93 mattr(__version__) = dev94 endif95

Note that the module function return type is Eigen This can then beused directly as NumPy data type in Python

3 The C++ function can then be called from Python as shown

1 import Data_Generation as dg2 import mysqlconnector3 import pandas as pd4 import sqlalchemy as db5 import time6

7 This function calls analytical Hestonoption pricer in C++ to simulation 5000000option prices and store them in MySQL database

8

9 10 do something 11

12 13

14 if __name__ == __main__15 t0 = timetime()16

17 A new table (optional)18 SQL_clean_table ()19

20 target_size = 5000000 Final table size21 batch_size = 500000 Size generated each

iteration22

23 Get the number of rows existed in thetable

24 r = SQL_setup(batch_size)25

26 Get the number of iterations27 iter = int(target_sizebatch_size) - int(r

batch_size)28 print(Iteration(s) required iter)29 count =030

31 print(Now Run C++ code )

80 Appendix A pybind11 A step-by-step tutorial with examples

32 for i in range(iter)33 count = count +134 print(

----------------------------------------------------)

35 print(Current iteration count)36 cpp = dgmodule () store data from C

++37

38 39 do something 40

41 42

43 r1=SQL_setup(batch_size)44 print(No of rows in SQL Classic Heston

Table Now r1)45 print(n)46 t1 = timetime()47 Calculate total time for entire program48 total = t1 - t049 print(THE ENTIRE PROGRAM TAKES round(

total) s TO COMPLETE)50

81

Appendix B

Asymptotic Expansions for GeneralStochastic Volatility Models

B1 Asymptotic Expansions

We present the asymptotic expansions for general Stochastic Volatility Mod-els by (Lorig et al 2019) Consider a strictly positive spot price S whoserisk-neutral dynamics are given by

S = eX (B1)

dX = minus12

σ2(t X Y)dt + σ(t X Y)dW1 X(0) = x isin R (B2)

dY = f (t X Y)dt + β(t X Y)dW2 Y(0) = y isin R (B3)〈dW1 dW2〉 = ρ(t X Y)dt |ρ| lt 1 (B4)

The drift of X is selected to be minus12 σ2 so that S = eX is martingale

Let C(t) be the time t value of a European call option matures at time Tgt t with payoff function P(X(T)) To value a European-style option usingrisk-neutrla pricing we must compute functions of the form

u(t x y) = E[P(X(T))|X(t) = x Y(t) = y]

It is well-known that the function u satisfised the Kolmogorov Backwardequation (Lorig et al 2019)

(partt +A(t))u(t x y) = 0 u(T x y) = P(x) (B5)

where the operator A(t) is given explicitly by

A(t) = a(t x y)(part2x minus partx) + f (t x y)party + b(t x y)part2

y + c(t x y)partxparty (B6)

and where the functions a b and c are defined as

a(t x y) =12

σ2(t x y)

b(t x y) =12

β2(t x y)

c(t x y) = ρ(t x y)σ(t x y)β(t x y)

83

Appendix C

Deep Neural Networks Library inPython Keras

We show the code snippet for computing ANN in Keras The Python code fordefining ANN architecture for rough Heston implied volatility approxima-tion

1 Define the neural network architecture2 import keras3 from keras import backend as K4 kerasbackendset_floatx(rsquofloat64 rsquo)5 from kerascallbacks import EarlyStopping6

7 Construct the model8 input_layer = keraslayersInput(shape =(n))9

10 hidden_layer1 = keraslayersDense(80 activation=rsquoelursquo)(input_layer)

11 hidden_layer2 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer1)

12 hidden_layer3 = keraslayersDense(80 activation=rsquoelursquo)(hidden_layer2)

13

14 output_layer = keraslayersDense (100 activation=rsquolinear rsquo)(hidden_layer3)

15

16 model = kerasmodelsModel(inputs=input_layer outputs=output_layer)

17

18 summarize layers19 modelsummary ()20

21 User -defined metrics R2 and Max Abs Error22 R223 def coeff_determination(y_true y_pred)24 SS_res = Ksum(Ksquare( y_true -y_pred ))25 SS_tot = Ksum(Ksquare( y_true - Kmean(y_true) )

)

84 Appendix C Deep Neural Networks Library in Python Keras

26 return (1 - SS_res ( SS_tot + Kepsilon ()))27 Max Error28 def max_error(y_true y_pred)29 return (Kmax(abs(y_true -y_pred)))30

31 earlystop = EarlyStopping(monitor=val_loss32 min_delta=033 mode=min34 verbose=135 patience= 25)36

37 Prepares model for training38 opt = kerasoptimizersAdam(learning_rate =0005)39 modelcompile(loss=rsquomsersquo optimizer=rsquoadamrsquo metrics =[rsquo

maersquo coeff_determination max_error ])

85

Bibliography

Alos Elisa et al (June 2017) ldquoExponentiation of Conditional ExpectationsUnder Stochastic Volatilityrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=2983180

Andersen Leif et al (Jan 2007) ldquoMoment Explosions in Stochastic VolatilityModelsrdquo In Finance and Stochastics 11 pp 29ndash50 DOI 101007s00780-006-0011-7

Atkinson Colin et al (Jan 2011) ldquoRational Solutions for the Time-FractionalDiffusion Equationrdquo In URL httpswwwresearchgatenetpublication220222966_Rational_Solutions_for_the_Time-Fractional_Diffusion_Equation

Bayer Christian et al (Oct 2018) ldquoDeep calibration of rough stochastic volatil-ity modelsrdquo In URL httpsarxivorgabs181003399

Black Fischer et al (May 1973) ldquoThe Pricing of Options and Corporate Lia-bilitiesrdquo In URL httpswwwcsprincetoneducoursesarchivefall09cos323papersblack_scholes73pdf

Carr Peter and Dilip B Madan (Mar 1999) ldquoOption valuation using thefast Fourier transformrdquo In Journal of Computational Finance URL httpswwwresearchgatenetpublication2519144_Option_Valuation_Using_the_Fast_Fourier_Transform

Chapter 2 Modeling Process k-fold cross validation httpsbradleyboehmkegithubioHOMLprocesshtml Accessed 2020-08-22

Comte Fabienne et al (Oct 1998) ldquoLong Memory In Continuous-Time Stochas-tic Volatility Modelsrdquo In Mathematical Finance 8 pp 291ndash323 URL httpszulfahmedfileswordpresscom201510comterenault19981pdf

Cox JOHN C et al (Jan 1985) ldquoA THEORY OF THE TERM STRUCTURE OFINTEREST RATESrdquo In URL httppagessternnyuedu~dbackusBCZdiscrete_timeCIR_Econometrica_85pdf

Create a C++ extension for Python httpsdocsmicrosoftcomen- usvisualstudio python working - with - c - cpp - python - in - visual -studioview=vs-2019 Accessed 2020-07-21

Duffy Daniel J (Jan 2018) Financial Instrument Pricing Using C++ SecondEdition John Wiley Sons ISBN 978-0-470-97119-2

Duffy Daniel J et al (2012) Monte Carlo Frameworks Building customisablehigh-performance C++ applications John Wiley Sons Inc ISBN 9780470060698

Dupire Bruno (1994) ldquoPricing with a Smilerdquo In Risk Magazine pp 18ndash20Eigen The Matrix Class httpseigentuxfamilyorgdoxgroup__TutorialMatrixClass

html Accessed 2020-07-22Eldan Ronen et al (May 2016) ldquoThe Power of Depth for Feedforward Neural

Networksrdquo In URL httpsarxivorgabs151203965

86 Bibliography

Euch Omar El et al (Sept 2016) ldquoThe characteristic function of rough Hestonmodelsrdquo In URL httpsarxivorgpdf160902108pdf

mdash (Mar 2019) ldquoRoughening Hestonrdquo In URL httpsssrncomabstract=3116887

Fouque Jean-Pierre et al (Aug 2012) ldquoSecond Order Multiscale StochasticVolatility Asymptotics Stochastic Terminal Layer Analysis CalibrationrdquoIn URL httpsarxivorgabs12085802

Gatheral Jim (2006) The volatility surface a practitionerrsquos guide John WileySons Inc ISBN 978-0-471-79251-2

Gatheral Jim et al (Oct 2014) ldquoVolatility is roughrdquo In URL httpsarxivorgabs14103394

mdash (Jan 2019) ldquoRational approximation of the rough Heston solutionrdquo InURL https papers ssrn com sol3 papers cfm abstract _ id =3191578

Hebb Donald O (1949) The Organization of Behavior A NeuropsychologicalTheory Psychology Press 1 edition (12 Jun 2002) ISBN 978-0805843002

Heston Steven (Jan 1993) ldquoA Closed-Form Solution for Options with Stochas-tic Volatility with Applications to Bond and Currency Optionsrdquo In URLhttpciteseerxistpsueduviewdocdownloaddoi=10111393204amprep=rep1amptype=pdf

Hinton Geoffrey et al (Feb 2015) ldquoOverview of mini-batch gradient de-scentrdquo In URL httpswwwcstorontoedu~tijmencsc321slideslecture_slides_lec6pdf

Hornik et al (Mar 1989) ldquoMultilayer feedforward networks are universalapproximatorsrdquo In URL https deeplearning cs cmu edu F20 documentreadingsHornik_Stinchcombe_Whitepdf

mdash (Jan 1990) ldquoUniversal approximation of an unknown mapping and itsderivatives using multilayer feedforward networksrdquo In URL httpwwwinfufrgsbr~engeldatamediafilecmp121univ_approxpdf

Horvath Blanka et al (Jan 2019) ldquoDeep Learning Volatilityrdquo In URL httpsssrncomabstract=3322085

Hurst Harold E (Jan 1951) ldquoLong-term storage of reservoirs an experimen-tal studyrdquo In URL httpswwwjstororgstable2982267metadata_info_tab_contents

Hutchinson James M et al (July 1994) ldquoA Nonparametric Approach to Pric-ing and Hedging Derivative Securities Via Learning Networksrdquo In URLhttpsalomiteduwp-contentuploads201506A-Nonparametric-Approach- to- Pricing- and- Hedging- Derivative- Securities- via-Learning-Networkspdf

Ioffe Sergey et al (Feb 2015) ldquoUBatch Normalization Accelerating DeepNetwork Training by Reducing Internal Covariate Shiftrdquo In URL httpsarxivorgabs150203167

Itkin Andrey (Jan 2010) ldquoPricing options with VG model using FFTrdquo InURL httpsarxivorgabsphysics0503137

Kahl Christian et al (Jan 2006) ldquoNot-so-complex Logarithms in the Hes-ton modelrdquo In URL httpwww2mathuni- wuppertalde~kahlpublicationsNotSoComplexLogarithmsInTheHestonModelpdf

Bibliography 87

Kingma Diederik P et al (Feb 2015) ldquoAdam A Method for Stochastic Opti-mizationrdquo In URL httpsarxivorgabs14126980

Kwok YueKuen et al (2012) Handbook of Computational Finance Chapter 21Springer-Verlag Berlin Heidelberg ISBN 978-3-642-17254-0

Lewis Alan (Sept 2001) ldquoA Simple Option Formula for General Jump-Diffusionand Other Exponential Levy Processesrdquo In URL httpspapersssrncomsol3paperscfmabstract_id=282110

Liu Shuaiqiang et al (Apr 2019a) ldquoA neural network-based framework forfinancial model calibrationrdquo In URL httpsarxivorgabs190410523

mdash (Apr 2019b) ldquoPricing options and computing implied volatilities usingneural networksrdquo In URL httpsarxivorgabs190108943

Lorig Matthew et al (Jan 2019) ldquoExplicit implied vols for multifactor local-stochastic vol modelsrdquo In URL httpsarxivorgabs13065447v3

Mandara Dalvir (Sept 2019) ldquoArtificial Neural Networks for Black-ScholesOption Pricing and Prediction of Implied Volatility for the SABR Stochas-tic Volatility Modelrdquo In URL httpswwwdatasimnlapplicationfiles8115704549291423101pdf

McCulloch Warren et al (May 1943) ldquoA Logical Calculus of The Ideas Im-manent in Nervous Activityrdquo In URL httpwwwcsechalmersse~coquandAUTOMATAmcppdf

Mostafa F et al (May 2008) ldquoA neural network approach to option pricingrdquoIn URL httpswwwwitpresscomelibrarywit-transactions-on-information-and-communication-technologies4118904

Peters Edgar E (Jan 1991) Chaos and Order in the Capital Markets A NewView of Cycles Prices and Market Volatility John Wiley Sons ISBN 978-0-471-13938-6

mdash (Jan 1994) Fractal Market Analysis Applying Chaos Theory to Investment andEconomics John Wiley Sons ISBN 978-0-471-58524-4 URL httpswwwamazoncoukFractal-Market-Analysis-Investment-Economicsdp0471585246

pybind11 Documentation httpspybind11readthedocsioenstableadvancedcasteigenhtml Accessed 2020-07-22

Ruf Johannes et al (May 2020) ldquoNeural networks for option pricing andhedging a literature reviewrdquo In URL httpsarxivorgabs191105620

Schmelzle Martin (Apr 2010) ldquoOption Pricing Formulae using Fourier Trans-form Theory and Applicationrdquo In URL httpspfadintegralcomdocsSchmelzle201020Fourier20Pricingpdf

Shevchenko Georgiy (Jan 2015) ldquoFractional Brownian motion in a nutshellrdquoIn International Journal of Modern Physics Conference Series 36 URL httpswwwworldscientificcomdoiabs101142S2010194515600022

Stone Henry (July 2019) ldquoCalibrating Rough Volatility Models A Convolu-tional Neural Network Approachrdquo In URL httpsarxivorgabs181205315

Using the C++ eigen library to calculate matrix inverse and determinant httppeopledukeedu~ccc14sta-663-201618G_C++_Python_pybind11

88 Bibliography

htmlUsing-the-C++-eigen-library-to-calculate-matrix-inverse-and-determinant Accessed 2020-07-22

Wilmott Paul (2006) Paul Wilmott on Quantitative Finance Vol 3 John WileySons Inc 2nd Edition ISBN 978-0-470-01870-5

  • Declaration of Authorship
  • Abstract
  • Acknowledgements
  • Projects Overview
    • Projects Overview
      • Introduction
        • Organisation of This Thesis
          • Literature Review
            • Application of ANN in Finance
            • Option Pricing Models
            • The Research Gap
              • Option Pricing Models
                • The Heston Model
                  • Option Pricing Under Heston Model
                  • Heston Model Implied Volatility Approximation
                    • Volatility is Rough
                    • The Rough Heston Model
                      • Rough Heston Pricing Equation
                        • Rational Approximation of Rough Heston Riccati Solution
                        • Option Pricing Using Fast Fourier Transform
                          • Rough Heston Model Implied Volatility
                              • Artificial Neural Networks
                                • Dissecting the Artificial Neural Networks
                                  • Neurons
                                  • Input Output and Hidden Layers
                                  • Connections Weights and Biases
                                  • Forward and Backward Propagation Loss Functions
                                  • Activation Functions
                                  • Optimisation Algorithms and Learning Rate
                                  • Epochs Early Stopping and Batch Size
                                    • Feedforward Neural Networks
                                      • Deep Neural Networks
                                        • Common Pitfalls and Remedies
                                          • Loss Function Stagnation
                                          • Loss Function Fluctuation
                                          • Over-fitting
                                          • Under-fitting
                                            • Application in Finance ANN Image-based Implicit Method for Implied Volatility Approximation
                                              • Data
                                                • Data Generation and Storage
                                                  • Heston Call Option Data
                                                  • Heston Implied Volatility Data
                                                  • Rough Heston Call Option Data
                                                  • Rough Heston Implied Volatility Data
                                                    • Data Pre-processing
                                                      • Data Splitting Train - Validate - Test
                                                      • Data Standardisation and Normalisation
                                                      • Data Partitioning
                                                          • Neural Networks Architecture and Experimental Setup
                                                            • Neural Networks Architecture
                                                              • ANN Architecture Option Pricing under Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Heston Model
                                                              • ANN Architecture Option Pricing under Rough Heston Model
                                                              • ANN Architecture Implied Volatility Surface of Rough Heston Model
                                                                • Performance Metrics and Validation Methods
                                                                • Implementation of ANN in Keras
                                                                • Summary of Neural Networks Architecture
                                                                  • Results and Discussions
                                                                    • Heston Option Pricing ANN Results
                                                                    • Heston Implied Volatility ANN Results
                                                                    • Rough Heston Option Pricing ANN Results
                                                                    • Rough Heston Implied Volatility ANN Results
                                                                    • Run-time Performance
                                                                      • Conclusions and Outlook
                                                                      • pybind11 A step-by-step tutorial with examples
                                                                        • Software Prerequisites
                                                                        • Create a Python Project
                                                                        • Create a C++ Project
                                                                        • Convert the C++ project to extension for Python
                                                                        • Make the DLL available to Python
                                                                        • Call the DLL from Python
                                                                        • Example Store C++ Eigen Matrix as NumPy Arrays
                                                                        • Example Store Data Generated by Analytical Heston Model as NumPy Arrays
                                                                          • Asymptotic Expansions for General Stochastic Volatility Models
                                                                            • Asymptotic Expansions
                                                                              • Deep Neural Networks Library in Python Keras
                                                                              • Bibliography
Page 17: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 18: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 19: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 20: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 21: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 22: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 23: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 24: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 25: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 26: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 27: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 28: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 29: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 30: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 31: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 32: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 33: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 34: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 35: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 36: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 37: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 38: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 39: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 40: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 41: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 42: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 43: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 44: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 45: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 46: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 47: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 48: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 49: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 50: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 51: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 52: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 53: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 54: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 55: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 56: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 57: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 58: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 59: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 60: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 61: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 62: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 63: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 64: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 65: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 66: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 67: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 68: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 69: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 70: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 71: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 72: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 73: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 74: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 75: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 76: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 77: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 78: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 79: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 80: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 81: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 82: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 83: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 84: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 85: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 86: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 87: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 88: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 89: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 90: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 91: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 92: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 93: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …
Page 94: UNIVERSITY OF BIRMINGHAM BIRMINGHAM BUSINESS …