14
XML Template (2012) [11.8.2012–9:27am] [1–14] //blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage] Quantitative Finance, 2012, 1–14, iFirst A new class of Bayesian semi-parametric models with applications to option pricing MARCIN KACPERCZYKy, PAUL DAMIEN*z and STEPHEN G. WALKERx yFinance Department, Stern School of Business, New York, NY 10012, USA zDepartment of Information, Risk, and Operations Management, McCombs School of Business, University of Texas in Austin, TX 78712, USA xInstitute of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury CT2 7NF, UK (Received 5 April 2011; in final form 3 July 2012) This paper develops a new family of Bayesian semi-parametric models. A particular member of this family is used to model option prices with the aim of improving out-of-sample predictions. A detailed empirical analysis is made for European index call and put options to illustrate the ideas. Keywords: Dirichlet process; Beta distribution; Scale mixtures; Options 1. Introduction Reliable predictions lie at the heart of most investment decisions. But in economic applications, as is well documented, reliable predictions are often difficult to obtain. In this context, stochastic volatility with jumps models lead to useful forecasting models; however, their parametric nature exposes them to some standard criti- cisms. Consider the following quote of Engle and Gonzalez (1991): ‘If we assume that the mean and variance equations are well specified but we do not know to which probability function they belong, then the ‘closest’ approximation to the true generating mechanism we can think of should come from the data itself. A non- parametric density responds to this concern’. In this paper, a new class of Bayesian models—a Semi- parametric Scale Mixture of Betas (SSMB)—is developed. The proposed setting allows one jointly to model skewness and kurtosis while accounting for the dynamics of the volatility in the time series. Modelling these components is practically important, given substantial empirical evidence that distributions of financial data tend to exhibit such features. For example, Campbell et al. (1997) document the existence of fat tails in monthly returns of the S&P500 index or skewness in the daily data of the same index. As is well known, parametric model misspecification might produce inconsistent estimators and these models require a fully specified distribution for the error term. Bollerslev et al. (1992) and Bates (1996) provide summa- ries of the error specifications commonly used in financial data. Despite their well-served purpose, non-parametric approaches are not so well represented in the option pricing literature. Importantly, due to technical difficul- ties, and often different focus, none of the existing models comprehensively tackles all the issues relevant for efficient option valuation. Specifically, most of the models do not address jointly the presence of jumps and stochastic volatility in the data, i.e. there is no clear counterpart for the parametric Stochastic Volatility with Jumps models. Several studies offer valuable insights into the behaviour of underlying returns using stochastic volatility models with jumps (e.g. Bates 2000, Duffie et al. 2000, Eraker et al. 2003, Maheu and McCurdy 2004). Among the more popular models, Stutzer (1996) assumes i.i.d. structure on the data but does not introduce any dynamics into volatility. On the other hand, non-parametric studies by Derman and Kani (1994) and Rubinstein (1994) are less flexible in modelling skewness and excess kurtosis. Also, since some of the methods rely on both the return and option data they require the presence of a liquid option market (as examples, Ait-Sahalia and Lo 1998, Ait-Sahalia and Duarte 2003). Additionally, to achieve reasonable convergence properties the models need to rely on a long time series of data (Ait-Sahalia and Lo 1998); *Corresponding author. Email: Paul.Damien@mccombs. utexas.edu Quantitative Finance ISSN 1469–7688 print/ISSN 1469–7696 online ß 2012 Taylor & Francis http://www.tandfonline.com http://dx.doi.org/10.1080/14697688.2012.712212

A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

Quantitative Finance, 2012, 1–14, iFirst

A new class of Bayesian semi-parametric models with

applications to option pricing

MARCIN KACPERCZYKy, PAUL DAMIEN*z and STEPHEN G. WALKERx

yFinance Department, Stern School of Business, New York, NY 10012, USAzDepartment of Information, Risk, and Operations Management, McCombs School of Business, University of

Texas in Austin, TX 78712, USAxInstitute of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury CT2 7NF, UK

(Received 5 April 2011; in final form 3 July 2012)

This paper develops a new family of Bayesian semi-parametric models. A particular memberof this family is used to model option prices with the aim of improving out-of-samplepredictions. A detailed empirical analysis is made for European index call and put options toillustrate the ideas.

Keywords: Dirichlet process; Beta distribution; Scale mixtures; Options

1. Introduction

Reliable predictions lie at the heart of most investmentdecisions. But in economic applications, as is welldocumented, reliable predictions are often difficult toobtain. In this context, stochastic volatility with jumpsmodels lead to useful forecasting models; however, theirparametric nature exposes them to some standard criti-cisms. Consider the following quote of Engle andGonzalez (1991): ‘If we assume that the mean andvariance equations are well specified but we do notknow to which probability function they belong, then the‘closest’ approximation to the true generating mechanismwe can think of should come from the data itself. A non-parametric density responds to this concern’.

In this paper, a new class of Bayesian models—a Semi-parametric Scale Mixture of Betas (SSMB)—is developed.The proposed setting allows one jointly to modelskewness and kurtosis while accounting for the dynamicsof the volatility in the time series. Modelling thesecomponents is practically important, given substantialempirical evidence that distributions of financial datatend to exhibit such features. For example, Campbellet al. (1997) document the existence of fat tails in monthlyreturns of the S&P500 index or skewness in the daily dataof the same index.

As is well known, parametric model misspecification

might produce inconsistent estimators and these models

require a fully specified distribution for the error term.

Bollerslev et al. (1992) and Bates (1996) provide summa-

ries of the error specifications commonly used in financial

data. Despite their well-served purpose, non-parametric

approaches are not so well represented in the option

pricing literature. Importantly, due to technical difficul-

ties, and often different focus, none of the existing models

comprehensively tackles all the issues relevant for efficient

option valuation. Specifically, most of the models do not

address jointly the presence of jumps and stochastic

volatility in the data, i.e. there is no clear counterpart for

the parametric Stochastic Volatility with Jumps models.

Several studies offer valuable insights into the behaviour

of underlying returns using stochastic volatility models

with jumps (e.g. Bates 2000, Duffie et al. 2000, Eraker

et al. 2003, Maheu and McCurdy 2004). Among the more

popular models, Stutzer (1996) assumes i.i.d. structure on

the data but does not introduce any dynamics into

volatility. On the other hand, non-parametric studies by

Derman and Kani (1994) and Rubinstein (1994) are less

flexible in modelling skewness and excess kurtosis. Also,

since some of the methods rely on both the return and

option data they require the presence of a liquid option

market (as examples, Ait-Sahalia and Lo 1998,

Ait-Sahalia and Duarte 2003). Additionally, to achieve

reasonable convergence properties the models need to rely

on a long time series of data (Ait-Sahalia and Lo 1998);*Corresponding author. Email: [email protected]

Quantitative FinanceISSN 1469–7688 print/ISSN 1469–7696 online � 2012 Taylor & Francis

http://www.tandfonline.comhttp://dx.doi.org/10.1080/14697688.2012.712212

Page 2: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

this, in turn, results in poor estimates of the tails of thedistribution.

The modelling problem confronting us in this paper is:‘How can one reliably price different options given thatthe cross-section of options is so large and the options aresubstantially different from each other?’ For example,some models might price in-the-money, long-term calloptions adequately, but may fail to do so with deep-out-of-the-money, short-term call options. Applying theSSMB model to pricing options leads to significantimprovements in accuracy compared to other non-parametric and parametric models. On average, predic-tion errors are reduced by up to three times for a broadrange of option contracts. Overall, the significant reduc-tion in pricing errors suggests that jointly modelling thevariance, skewness, and kurtosis better describes the risk-neutral predictive distribution of options.

The rest of the paper proceeds as follows. In section 2,we describe the financial modelling aims of this paper.In section 3 a new family of semi-parametricBayesian models is introduced. Section 4 discusses theconstruction and properties of the data. Section 5provides out-of-sample empirical results for Europeanindex options written on the S&P500, using both non-parametric models and the parametric Black–Scholesmodel. Some conclusions are provided in section 6.

2. Modelling aims

The two major goals of this section are to: (a) describe thetime series, while also motivating the reason to findpredictive distributions for such data; (b) describe the riskneutral evaluation method employed in pricing theoptions.

2.1. The time series

Let St denote the value (price) of an asset at time t. Let Dt

be the dividend value paid by the same company over theperiod t� 1 to t. From a statistical modelling perspective,it is better to consider the natural log of asset returns,denoted rt, and defined as

rt ¼ lnSt � St�1 þDt

St�1

� �:

It is well known that the value of a European call (C)and put option (P) can be obtained using the followingidentities:

C ¼ e���EQðST � KÞþ,P ¼ e���EQðK� STÞ

þ,ð1Þ

where EQ stands for the expectation operator under therisk-neutral measure Q, ST is the price of the underlyingasset at maturity T, � is the annualized spot interest rate, �is the time to maturity (in years), K is the option strikeprice, and sþ�max(s, 0).

In our setting � is assumed to be fixed and known,hence ST is the only unknown random quantity in the

above equations. Consequently, to calculate the optionprice we need to construct the predictive distribution ofST. In our analysis, we will first estimate the predictivedistribution of the log return on the underlying asset attime t, namely rt, and then obtain the predictive distri-bution of the asset price at maturity (ST) via the followingrecursive identity:

St ¼ St�1ert , ð2Þ

where t¼ 1, . . . ,T. Note, by convention, S0 is theobserved value of the asset price at the time of makingpredictions. Thus, the data which we model are the log-return of the underlying asset.

2.2. Risk-neutral valuation

In the last subsection, we motivated the need to obtainpredictive distributions of asset prices. These distribu-tions, however, are derived under a physical probabilitymeasure. But to make use of our method for optionpricing, the physical measure must be converted into arisk-neutral measure. The main benefit of using a risk-neutral measure stems from the fact that once the risk-neutral probabilities are found, every asset can be pricedby simply calculating its expected payoff (that is, dis-counting as if investors were risk neutral). If we usedphysical probabilities, every security would require adifferent adjustment (as they differ in riskiness).Converting the physical probability measure to the risk-neutral one requires the absence of arbitrage. To this end,we follow the framework proposed by Huang andLitzenberger (1988) and Stutzer (1996). Specifically, thecanonical valuation method of Stutzer (1996) allows oneto convert the predictive distribution from the previoussubsection into a risk-neutral distribution. The method,described in detail by Stutzer, utilizes the maximumentropy principle to estimate the unknown martingalemeasure. Importantly, we can price options under noarbitrage. We start with a sequence of M draws from thepredictive distribution of the index price. Using thesevalues we can construct the �-period gross returnsfRgMi¼1 � fST=Stg

Mi¼1, where �¼T� t denotes time to

maturity of a given contract. The true risk-adjusteddensity, ��(i), has to satisfy the following:

XMi¼1

��ðiÞRi

�¼ 1: ð3Þ

As pointed out by Stutzer, the quantity �� can beobtained by solving the following convex minimizationproblem:

�̂� ¼ argmin��ðiÞ4 0,

PM

i¼1��ðiÞ¼1

Ið��, �̂Þ

�XMi¼1

��ðiÞ ln ��ðiÞ=�̂ðiÞð Þ ð4Þ

subject to (3) holding. The objective function Ið��, �̂Þ isthe Kullback–Leibler information criterion distance ofthe positive probabilities �� to the empiricalprobabilities �̂, and the equality follows from the fact

2 M. Kacperczyk et al.

Page 3: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

that �̂ðiÞ ¼ 1=M implies that minimizing I is equivalent toconstrained maximization of the Shannon entropy. It canbe shown that the solution to the above optimizationproblem is given by

�̂�ðiÞ ¼expð�� Ri=�ÞPMi¼1 expð�

� Ri=�Þi ¼ 1, . . . ,M: ð5Þ

The Lagrange multiplier, ��, is obtained from thefollowing unconstrained convex problem:

�� ¼ argmin�XMi¼1

exp �ðRi=�� 1Þ� �

: ð6Þ

Having obtained the simulated values of the risk-neutraldensity we will be able to price call and put options usingthe following formulae:

Ct ¼XMi¼1

ðST,i � KÞþ

��̂�ðiÞ, ð7Þ

and

Pt ¼XMi¼1

ðK� ST,iÞþ

��̂�ðiÞ: ð8Þ

Here t denotes the time of the call/put and T the time ofmaturity, with t5T.

3. Methodology

The primary goal of this section is to introduce a newclass of semi-parametric Bayesian models that is subse-quently used to obtain predictive distributions of indexreturns.

3.1. Semi-parametric scale mixture of betas (SSMB)

There are four ideas that we link together to develop theclass of models we call the semi-parametric scale mixtureof betas (SSMB).

(1) Scale mixture representation; see, for example,Feller (1971).

(2) A non-parametric family of prior distributions,namely the Dirichlet process; see appendix A andFerguson (1973).

(3) Variance regression; and(4) Gibbs sampling; see, for example, Smith and

Roberts (1993).

3.1.1. Scale mixture representations. Since the scalemixture of uniform representation is central to theconstruction of the SSMB family of models, this is firstexplained in detail. The other features of the constructionare then readily tagged on to the scale mixture approach.Since we want a unimodal density, we use uniforms andbeta distributions in the mixture distributions, ensuringunimodality. With normal kernels we could get amulitmodal density, which does not make sense in our

context since outliers will be modelled incorrectly. Also,

we want a heavier tail rather than another mode. It is for

these reasons that we prefer an SSMB rather than taking

a normal kernel.It is intuitively easier to understand our model by first

addressing the kurtosis in the underlying data distribu-

tion. With r denoting observed data, and U denoting a

latent mixing random variable, Feller’s formulation of the

conditional distribution of r is given by

f ðrjU ¼ uÞ � Uniformð�� �ffiffiffiup

,�þ �ffiffiffiupÞ,

u � F ,ð9Þ

for some distribution function F with support on (0, 1).

As F ranges over all such distribution functions, the

density of r ranges over all unimodal and symmetric

density functions. Consequently, with a flexible prior on

F, such a model can capture wide ranges of kurtosis in the

data. To ensure maximum flexibility we will model F non-

parametrically in the next subsection. We useffiffiffiup

rather

than u in the formulation above because we can express

higher moments for r in terms of lower moments for U. It

is easier to understand the impact of this by setting �¼ 0

for now. In appendix B, the entire computational form of

the model is constructed with � 6¼ 0. Now, we have

Var(r)¼ �2 E(U)/3 and E(r4)¼ �4 E(U2)/5. Hence, we can

rewrite the model as f ðrjUÞ ¼ �ffiffiffiffiUpð1� 2betað1, 1ÞÞ which

will suggest the form of generalizations to asymmetric or

skewed densities. From a simulation perspective, the

notation f ðrjUÞ ¼ �ffiffiffiffiUpð1� 2betað1, 1ÞÞ means that f(rjU)

can be generated as a �ffiffiffiffiUpð1� 2betað1, 1ÞÞ random

variable. An interesting fact that is used later on is

noted here: if F is distributed Gamma(3/2, 1/2), then the

distribution of r is Normal(0, �2).Apart from endogenizing kurtosis (as we did above), in

our option pricing application we are also interested in

modelling the extent of skewness. The extension to

asymmetric densities is quite straightforward. Since the

uniform density is a beta(1, 1) density, we can introduce

skewness by having instead a beta(1, a) density, for some

parameter a40. The first equation of the model in (9)

becomes

f ðrjU, aÞ ¼ a�1ð1þ aÞ�ffiffiffiffiUp

1=ð1þ aÞ � betað1, aÞ� �

:

ð10Þ

We recover (9) when a¼ 1. We will change (1þ a)�/a to �.There is no loss in doing this because it is only a scaling

issue; moreover, as a result, later on, the analysis is

simplified considerably. We will shorten the writing of the

model to

f ðrjU, aÞ ¼ �ffiffiffiffiUp

BðaÞ;

BðaÞ ¼ 1=ð1þ aÞ � betað1, aÞ:ð11Þ

The density function for W¼B(a) is given by

fWðwÞ ¼ a a=ð1þ aÞ þ w� �a�1

1ð�a=ð1þ aÞ

5w5 1=ð1þ aÞÞ:

A new class of Bayesian semi-parametric models with applications to option pricing 3

Page 4: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

The moments of this density are readily obtained. To seethis, first note that, depending on whether we have left orright skewness, the above approach allows us to havea¼ 1þ � or a¼ 1� � for some small �40. Now it is easyto verify that

EW2 ¼a

ð1þ aÞ2ð2þ aÞ

and

EW3 ¼ �2

ð1þ aÞ3þ

6

ð1þ aÞ2ð2þ aÞ

�6

ð1þ aÞð2þ aÞð3þ 1Þ:

Defining as usual the skewness as

SkðaÞ ¼EW3

ðEW2Þ3=2

,

we see that for a¼ 1þ � we have Sk(a)¼��þO(�2) andfor a¼ 1�� we have Sk(a)¼ �þO(�2), where is aconstant; for completeness, one could calculate , which isactually 123/2/48� 0.866. Hence, it does not matterwhether we use beta(1, a) (a41) or beta(a, 1) (a51). Toelaborate on this point, if X is beta(1, a) then 1�X isbeta(a, 1). Combine this with the notion that for beta(1, a)we have essentially symmetric skewness (to first order).Hence, we could address first-order skewness using eitherbeta(1, a) or beta(a, 1). Of course this relies on smallskewness. On the other hand if skewness is large and inone direction, the choice of beta(1, a) or beta(a, 1) wouldbe obvious in order to get the skewness on the correctside. Thus, the mixture of beta distributions capturespositive skewness via the parameter a.

3.1.2. Bayesian non-parametrics and semi-

parametrics. To make use of the scale mixture ofbetas, recall that we need to specify the distributionfunction F in (9). In doing so, we will follow a non-parametric Bayesian approach. As noted previously, suchan approach allows us to introduce greater flexibility intothe choices of F; this is critical since the data analysed inthis paper have different levels of kurtosis.

A non-parametric scale mixture model is obtained byassigning F a stochastic process prior; here we use thewell-known Dirichlet process; see appendix A andFerguson (1973). F�Dir(c, F0) means that F is assigneda Dirichlet process prior with expectation F0 and scaleparameter c40. Here c is a measure of strength of beliefin the prior choice of F0. Note, as an example, it ispossible to centre the location parameter, F0, on anymember of the exponential power family. This impliesthat our scale mixture of beta representation encapsulatesall ranges of kurtosis. We use the Dirichlet process fortwo reasons: (a) the theoretical properties of the processare very appealing; see Ferguson (1973) and appendix A;(b) implementing the overall model is highly simplified;see MacEachern (1998). We note here that the scalemixture representation is such that we actually bypass

simulating from the posterior distribution over theunknown F, i.e. the computational burden is substantiallyreduced; see also Brunner and Lo (1989) and appendix B.With F t denoting all the information up to and includingtime t, we now have the following hierarchical modellingframework:

f ðrtjF t�1Þ ¼ffiffiffiffiffiffiUt

p�t BtðaÞ,

where

U1 � � �Ut � pðu1, . . . , utÞ: ð12Þ

Here �t is the volatility and it will be described later howthis will depend on the past. Finally, Bt(a) are indepen-dent and identically distributed copies of B(a). We takep(u1, . . . , ut) to be based on a Dirichlet process prior; seeappendix A for details on this prior. Having modelled thekurtosis via the variance of Ut, we next model �t.

A variance regression model In the above, �t can bemodelled so that it depends on context-specific regressorsvia the following variance regression:

�t ¼ exp 0 þXMk¼1

kZk,t�1

!, ð13Þ

where 0, . . . ,M are parameters to be estimated and the{Zk,t�1} are observed information (independent variables)up to and including time t� 1. For the empirical analysisin this paper, we use the squared past log returnscalculated as Zt¼ {ln(St/St�1)}

2, where S denotes thevalue of the S&P500 index. The motivation for using thisvariable primarily comes from other studies in finance.For example, it would be similar to a specification ofARCH-type models; see Engle and Gonzalez (1991).Similarly, Ghysels et al. (2006) use lagged squared pastreturns as their volatility predictor. We emphasize thatour model formulation and computational algorithms arenot significantly dependent on this particular choice ofregressor.

An important feature of stock markets is the presenceof leverage effects, i.e. volatility is larger when returns arenegative. In the current version of the model, we do notexplicitly account for such effects, largely because wewant to focus on the most novel aspects of our new SSMBmodel and illustrate their empirical importance. But ingeneral one could model such leverage effects by settingthe mean of parameter k to be less than zero.

Overall, the class of models in equations (12) and (13)where �t is modelled as a regression and F is modelled viaa Dirichlet process is what is termed as SSMB in thispaper. The predictions of future prices, ST, from this classof models are the the key inputs needed to price options;see equations (7) and (8).

To complete the Bayesian specification, prior distribu-tions are assigned to F, , c and a.

3.1.3. Prior distributions. The following describes thevarious priors used in the empirical analysis. Wherenecessary, a conjugate hyper-prior is used. Given recentadvances in Bayesian computation, the practitioner

4 M. Kacperczyk et al.

Page 5: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

can readily employ non-conjugate prior distributions ifneeded; for details, see MacEachern (1998) and Mira et al.(2001).

The scale parameter of the Dirichlet process, c, isassigned a Gamma(a0, b0) hyper-prior distribution. Thesecond parameter is the prior guess at F0, which we willtake to be a Gamma(3/2, 1/2) distribution. The reason forthis is that under this distribution, and in the symmetriccase with a¼ 1, the marginal distribution for rt will beNormal with mean 0 and variance �2t . From a financeperspective, the zero-mean excess stock returns assump-tion may be somewhat restrictive, but is consistent withthe common notion in finance that expected returns aredifficult to predict on average. Also, it fits well with thestandard no-arbitrage argument that expected returns areequal to the risk-free rate. Prior studies have used suchspecifications, including Merton (1976). From a statisticalperspective, in the symmetric case, note that even if onecentred the prior for F0 around zero, the variance couldbe taken to be very large, thus alleviating the zero-meanassumption. In practice, in the absence of strong priorinformation, it is advisable to set the prior variance to bevery large.

We assign a prior distribution to each of the k in thevariance regression (13), which are assumed to beindependent normal distributions with zero means andvariances 2

k. Finally, the skewness parameter a isassigned a Gamma distribution.

A strength of the Bayesian approach is its ability toincorporate context-specific knowledge in the modellingprocess. However, for illustrative purposes, all of ourprior settings were chosen to reflect vague prior knowl-edge. Denoting � to be a prior distribution, �(0)¼N(0,10), �(1)¼N(0, 10). We take �(c), the scale parameter ofthe Dirichlet process, to be Gamma(a0, b0) witha0¼ b0¼ 0.01. The skewness parameter a is assigned�(a)¼Gamma(c0, d0), c0¼ d0¼ 0.01.

The computational aspect of SSMB is implicit in itsformulation and points to a Markov chain Monte Carlo(MCMC) scheme that could be implemented to obtainposterior and predictive distributions. The Gibbs samplerfor the above model is detailed in appendix B. We ran thesampler for one million iterations. Using well-knownconvergence diagnostics, having ‘burned-in’ the first500 000 iterates, approximate independence in the sampledvariates was obtained by using every 1000th iterate fromthe chain. The algorithm could take a few days to execute ifone analyses thousands of contracts. In practice, onewould seldom, if ever, use data going back to 1983 to pricean option in the year, say 2010. Typically, the pricing ofoptions, as is well documented, relies on data for no morethan one year, and interest is usually on predicting prices ofthe current options, in which case the results from theanalysis in this paper would be quite fast indeed.

4. The data

The data consist of European call and put options,written on the S&P500 index, traded on the Chicago

Board Options Exchange. The sample covers a period of20 years from January 1983 through December 2002. Theselection of this period has been dictated by the avail-ability of the data. By selecting such a long series of data,covering most of the spikes in the time series of assetreturns, the empirical results should not be significantlydriven by specific features of the market data.

Details of the adjustments and exclusion criteria madeto the data are presented in Bakshi et al. (1997). Here, intable 1, we provide some key features of the data, namelythe moneyness and the maturity date for each class of calland put options as well as the average option prices,which is meant to help interpret the subsequently reportedpricing errors.

In Panel A (B) of table 1, we report the summarystatistics for the call (put) options with the average priceand the total number of observations (in parentheses) foreach moneyness–maturity category. For example, a calloption is considered to be out-of-the-money (OTM) if itsmoneyness, defined as S/K, does not exceed 0.97, and at-the-money (ATM) if its moneyness falls between 0.97 and1.03. All other contracts are in-the-money (ITM).In addition, we divide all contracts into three groups ofmaturities: those with not more than 40 days to maturity,those with maturities between 41 and 70, and those withmaturity longer than 70 days. Our sample includes a totalof 701 600 option observations with 332 856 calls and368 744 puts. In the call (put) group, OTM and ATMoptions make up approximately 40.7 (50.9) percent and31.7% (28.8%) of the total sample, respectively. Theaverage call price ranges from $3.07 for short-term, deepOTM options to $114.96 for long-term, deep ITMcontracts. In contrast, the smallest price for a putoption equals $2.97 while the largest one is $128.12.

Since the method used in this paper requires the historyof past stock index returns, we use one year of dailyobservations as an estimation period. The time range ofthe estimation window changes as we move forward intime. Also, in order to compare the prices obtained fromthe model to real-time prices, the maturity date of anyoption cannot exceed December 2002.

In order to evaluate the accuracy of our optionpricing model, we select the following two popular non-parametric benchmarks for comparison: the canonicalvaluation model of Stutzer (1996), and the constrainednon-parametric estimator of Ait-Sahalia and Duarte(2003). The reason for working with these non-parametricbenchmarks is mainly due to their close relationship to theclass of models advocated in this paper. We also provide acomparison to a parametric model, namely the widelyused Black–Scholes option pricing model.

5. Empirical results

In section 2, we described the procedure to obtain optionprices, noting there that the end point of the analysis wasto evaluate equations (7) and (8) for call and put options,respectively. The only random, unknown quantity neededto evaluate these equations was the predictive distribution

A new class of Bayesian semi-parametric models with applications to option pricing 5

Page 6: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

of the index returns. The SSMB model provides thisdistribution. In the following discussion, and correspond-ing tables and graphs, the following abbreviations areused: Black–Scholes model (BS); canonical valuationmodel of Stutzer (CV); the constrained non-parametricestimator of Ait-Sahalia and Duarte (SC); and SSMBdenotes our model.

5.1. Predictions

We begin with the presentation of our estimation results,which considers the evolution, with respect to maturity, ofpredicted prices for six different option contracts, namelythree calls and three puts. To capture the time-seriesvariation in volatility we consider contracts sparsed overthree different periods, 1992, 1997 and 2002. These threeperiods arguably have had very different volatilitypatterns, which allows us to assess the robustness of ourmodel. We compare the predictions from the SSMBmodel to those from the SC model and the market prices.Figure 1 presents the results.

In most contract configurations, the SSMB modelperforms better than the SC model. The BS and CValternatives perform uniformly worse for these contracts,

and hence were omitted from the graphs. The superioroutperformance of SSMB is particularly visible forshorter maturities and out-of-the-money contracts.While somewhat informative, figure 1 does not give aprecise estimate of the observed improvements. Hence, wenow turn to a thorough presentation of our results.

To compare the full-sample efficiency of the SSMBmethod relative to other benchmarks, as well as acrossvarious strike prices and maturities, each contract in thesample is assigned into 18 groups sorted by maturity andmoneyness. Specifically, all call and put options aredivided into six moneyness classes. Next, within each suchclass, three maturity groups are formed. As a result, eachcontract is assigned to one of 18 bins. Within each bin theaverage dollar and percentage errors are calculated. Foreach group, the accuracy of the SSMB pricing model tothat of the CV, SC and BS models are compared. Themetric for comparison is the root mean squared errors(RMSEs), defined as the square root of the mean squareddeviations of the model price from the observed price.To calculate percentage errors, the RMSEs are furtherscaled by the average price of the option in the sample.Tables 2 and 3 present the results for call and put options,respectively.

Table 1 Sample properties of S&P500 index options.

Moneyness,Days-to-expiration

S/K 6–40 41–70 470 Subtotal

Panel A: Call options50.94 $3.07 $5.94 $17.82

OTM (16,638) (15,714) (46,478) (78,830)0.94–0.97 $7.30 $15.05 $37.62

(20,342) (12,999) (22,295) (55,636)0.97–1.00 $15.14 $25.31 $47.94

ATM (23,902) (12,976) (21,871) (58,749)1.00–1.03 $28.91 $37.79 $58.05

(20,440) (9,369) (17,237) (47,046)1.03–1.06 $45.50 $51.29 $68.44

ITM (14,628) (6,029) (11,346) (32,003)41.06 $99.92 $102.56 $114.96

(25,951) (13,381) (21,260) (60,592)Subtotal (121,901) (70,468) (140,487) (332,856)

Panel B: Put options50.94 $128.12 $125.13 $126.89

ITM (10,411) (7,897) (18,843) (37,151)0.94–0.97 $42.35 $45.07 $63.89

(11,478) (6,327) (14,566) (32,371)0.97–1.00 $23.23 $31.64 $46.78

ATM (21,190) (10,961) (20,423) (52,574)1.00–1.03 $12.79 $21.10 $35.22

(22,695) (12,505) (20,581) (55,781)1.03–1.06 $7.51 $14.07 $25.26

OTM (19,462) (10,835) (16,796) (47,093)41.06 $2.97 $5.03 $11.96

(44,514) (34,358) (64,902) (143,774)Subtotal (129,750) (82,883) (156,111) (368,744)

This table reports the summary of the data used in the study. The cross-section of the call options has been divided into 18 categories: with respect to

expiration date (� 40 days, (40,704) days, and470 days) and moneyness (out of the money (OTM); at the money (ATM); and in the money (ITM)).

Each cell represents the average option price in each maturity-moneyness category along with the number of contracts which were used to calculate

the averages (in parentheses). The sample covers the period of 1 January 1983–31 December 2002. Options with maturity less than 6 days, price lower

than 0.375 and those violating arbitrage conditions have been excluded from the sample.

6 M. Kacperczyk et al.

Page 7: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

The results in table 2 indicate several beneficial aspectsof the SSMB approach compared to the two non-parametric approaches (CV and SC): it significantlyimproves the fit of the volatility structure into the cross-section of option contracts.

First, observe that the average pricing errors across thedifferent maturity/moneyness classes are less variablecompared to CV and lower compared to SC. Thepercentage pricing errors for SSMB vary between 12.98and 76.29%, while for CV the spread ranges between13.77 and 156.17%. Likewise, the respective range for theSC method is 24.20–100.65%.

Second, observe that SSMB produces errors of asimilar magnitude irrespective of the maturity date ofthe contract. This is in contrast especially to the CVmethod, which tends to produce very high errors espe-cially for contracts with longer time to maturity. Thus, forexample, for contracts with low moneyness (less than0.94) and long maturity (more than 70 days), the averagepricing error decreases from 156.17 to 51.06% under theSSMB model. An explanation for this is that the CVapproach precludes the existence of volatility clustering;

thus the data converge very quickly to normality, which isat odds with observed price patterns. Similar trends arenoted across all moneyness series. The SC method doesmuch better than the CV approach in this regard; thispoint has been detailed in Ait-Sahalia and Lo (1998).

Third, within the various maturity contracts, thepricing errors generally decrease with moneyness bothfor the CV and SSMB models, but they decrease more forthe latter, especially for longer maturities. This is incontrast to the SC model for which the pricing errorsform a U-shape pattern. Such a pattern is consistent withthe intuition provided by Ait-Sahalia and Duarte (2003)who argue that non-parametric methods, like the onesthey developed, fail to capture the tails of the cross-section of option contracts. Consequently, the pricingerrors under the SC method always increase at extrememoneyness values. This seems to be the case for allmaturity classes where the pricing errors are lowest forat-the-money contracts. Along this dimension of com-parison, the SSMB method provides predictions withsignificantly lower errors for both out-of-the-money andin-the-money contracts. The differences are especially

Figure 1. Evolution of prices for SSMB and SC models. This figure depicts the evolution of predicted option prices as a function oftime to maturity for six different option contracts: three calls and three puts for the years 1992, 1997 and 2002. The prices areobtained from the SSMB and SC models and are plotted against the market option price.

A new class of Bayesian semi-parametric models with applications to option pricing 7

Page 8: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

pronounced for deep in-the-money contracts, where theSC method results in errors which are approximatelythree times larger than the errors compared to the SSMBmethod.

Finally, we observe the highest average mispricing forshort-term out-of-the-money contracts. In this group,the CV approach produces the highest pricing errors(over 137%), followed by the SC approach, producingerrors of about 100%, while SSMB has errors ofabout 76%.

The results for put options, presented in table 3, aregenerally in line with the findings documented for calloptions.

While SC and CV methods are based on a non-parametric approach, the BS model is a parametricapproach. It is apparent from both tables that the maindrawback of the BS model is its poor performance for thevery short-term out-of-the-money contracts. At the sametime, the BS model does quite well for the in-the-moneycontracts, especially relative to the SC and CV methods.This result is not surprising in light of the empiricalliterature that has found that parametric methods, ingeneral, largely fail for the deep out-of-the-moneycontracts (e.g. Bakshi et al. 1997).

To facilitate additional comparisons, figures 2 and 3summary evidence of the pricing errors for the differenttypes of pricing models. In these graphs, the x-axisindicates three different categories of optionmaturity, and the y-axis indicates six different categoriesof option moneyness, consistent with those in tables 2and 3.

To understand the apparent improvement in the optionpricing consider figure 4, which depicts (randomlychosen) predictivedistributions of the underlying asset(S&P500 index) for four different time periods in theoptions data. What is striking is that the variance,skewness and kurtosis of these distributions are markedlydifferent. The SSMB model thus nicely accounts for thesedifferences. This flexibility in distributions is very impor-tant for pricing call and put contracts with a spectrum ofdifferent strike prices. In fact, this is one of the mainreasons that the pricing errors under the SSMB approachare smaller than under other non-parametric approaches.

One could argue that the performance of our methodmay be sensitive to a particular choice of the sampleperiod. We feel this is not the case because our samplecovers a period with tremendous market fluctuations(January 1983 through December 2002). In fact,

Table 2 Out-of-sample pricing errors of the sample of S&P500 index options: call options.

Days-to-expiration

Moneyness,6–40 41–70 470

Model S/K RMSE % Error RMSE % Error RMSE % Error

BS 50.94 $6.71 218.57 $4.54 76.43 $6.04 33.890.94–0.97 $7.43 101.78 $5.09 33.82 $7.07 18.790.97–1.00 $3.92 25.89 $5.31 20.98 $8.16 17.021.00–1.03 $4.63 16.02 $6.08 16.09 $10.17 17.521.03–1.06 $5.58 12.26 $7.09 13.82 $11.56 16.89

41.06 $7.48 7.49 $8.96 8.74 $7.01 6.10

CV 50.94 $4.21 137.13 $6.28 105.72 $27.83 156.170.94–0.97 $5.55 76.03 $10.74 71.36 $44.72 118.870.97–1.00 $8.02 52.97 $14.17 55.99 $53.05 110.661.00–1.03 $10.66 36.87 $18.06 47.79 $58.66 101.051.03–1.06 $12.51 27.49 $21.63 42.17 $61.83 90.34

41.06 $13.76 13.77 $26.09 25.44 $65.36 56.85

SC 50.94 $3.09 100.65 $5.02 84.51 $9.82 55.110.94–0.97 $4.67 63.97 $7.41 49.24 $9.26 24.610.97–1.00 $9.20 60.77 $11.00 43.46 $11.60 24.201.00–1.03 $16.80 58.11 $16.92 44.77 $15.62 26.911.03–1.06 $26.61 58.48 $24.70 48.16 $21.27 31.08

41.06 $63.78 63.83 $58.34 56.88 $41.13 35.78

SSMB 50.94 $2.34 76.29 $4.20 70.64 $9.10 51.060.94–0.97 $3.22 44.14 $6.10 40.51 $13.11 34.860.97–1.00 $4.98 32.91 $8.01 31.66 $14.88 31.031.00–1.03 $7.39 25.55 $9.76 25.82 $16.02 27.591.03–1.06 $10.25 22.53 $11.67 22.76 $16.47 24.07

41.06 $13.44 13.45 $13.31 12.98 $20.54 17.87

This table reports the average out-of-sample pricing errors for call options with different maturities and moneyness levels for the period 1983–2002.

The root mean squared errors (RMSEs), and percentage errors have been calculated for the Black–Scholes model (BS), the canonical valuation (CV)

of Stutzer (1996), the kernel estimation with shape constraint (SC) of Ait-Sahalia and Duarte (2003), and the Semi-parametric Scale Mixture of Betas

(SSMB) model defined in this paper. All options have been divided with respect to their maturities and moneyness levels (defined as the ratio of spot

price to strike price) into eighteen groups. RMSE has been calculated as a root of the average mean squared error, while the percentage price error

further scales the RMSE by the average price of the option.

8 M. Kacperczyk et al.

Page 9: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

Table 3 Out-of-sample pricing errors of the sample of S&P500 index options: put options.

Days-to-expiration

Moneyness,6–40 41–70 470

Model S/K RMSE % Error RMSE % Error RMSE % Error

BS 50.94 $19.27 14.91 $7.30 5.79 $10.95 8.630.94–0.97 $6.99 16.97 $8.94 19.89 $8.63 13.500.97–1.00 $5.48 25.15 $5.66 17.90 $6.97 14.901.00–1.03 $3.29 27.59 $3.68 17.46 $6.61 18.781.03–1.06 $2.90 40.21 $3.75 26.66 $6.23 24.68

41.06 $3.11 105.38 $3.92 77.92 $6.92 57.86

CV 50.94 $18.44 14.39 $27.43 21.92 $69.51 54.780.94–0.97 $16.74 39.53 $27.81 61.70 $56.95 89.140.97–1.00 $12.79 55.06 $23.28 73.58 $48.94 104.621.00–1.03 $9.87 77.17 $18.54 87.87 $39.17 111.211.03–1.06 $7.62 101.46 $14.31 101.71 $29.58 117.10

41.06 $4.35 146.46 $7.05 140.16 $16.02 133.95

SC 50.94 $90.41 70.57 $54.36 43.44 $27.23 21.460.94–0.97 $18.81 44.42 $12.19 27.05 $11.88 18.590.97–1.00 $9.21 39.65 $8.12 25.66 $9.36 20.011.00–1.03 $5.96 46.60 $6.36 30.14 $8.33 23.651.03–1.06 $5.10 67.91 $6.15 43.71 $8.31 32.90

41.06 $4.07 137.04 $5.39 107.16 $6.72 56.19

SSMB 50.94 $10.90 8.51 $14.01 11.20 $15.52 12.230.94–0.97 $9.55 22.56 $4.87 10.81 $11.57 18.110.97–1.00 $7.39 31.83 $6.10 19.28 $9.23 19.741.00–1.03 $5.30 41.43 $5.85 27.71 $9.71 27.571.03–1.06 $4.97 66.18 $5.64 40.12 $9.65 38.22

41.06 $3.54 119.19 $3.73 74.10 $8.84 73.90

This table reports the average out-of-sample pricing errors for put options with different maturities and moneyness levels for the period 1983–2002.

The root mean squared errors (RMSEs), and percentage errors have been calculated for the Black–Scholes model (BS), the canonical valuation (CV)

of Stutzer (1996), the kernel estimation with shape constraint (SC) of Ait-Sahalia and Duarte (2003), and the Semi-parametric Scale Mixture of Betas

(SSMB) model defined in this paper. All options have been divided with respect to their maturities and moneyness levels (defined as the ratio of spot

price to strike price) into eighteen groups. RMSE has been calculated as a root of the average mean squared error, while the percentage price error

further scales the RMSE by the average price of the option.

Figure 2. Pricing errors for call options. This figure depicts the percentage pricing errors for the cross-section of call options as afunction of time to maturity and moneyness for four different models: Black–Scholes (BS), canonical valuation (CV), shapeconstraint (SC), and Semi-parametric Scale Mixture of Betas (SSMB). Maturity is divided into three bins:540 days, (40,704) days,and470 days. Moneyness is divided into six bins:50.94, (0.94, 0.97), (0.97, 1.00), (1.00, 1.03), (1.03, 1.06) and41.06.

A new class of Bayesian semi-parametric models with applications to option pricing 9

Page 10: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:27am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

we include most of the important spikes in the returns ofthe S&P500 index.

6. Concluding remarks

In this paper, we develop a new class of BayesianSemi-parametric Scale Mixture of Beta (SSMB) models,

and apply it to pricing the S&P500 index call and put

options. The empirical results in this paper show that the

SSMB structure offers significant benefits in describing

the patterns of volatility in the cross-section of options

data when compared to non-Bayesian non-parametric

methods. For the short-term, deep out-of-the-money

options the parametric Black–Scholes model does very

Figure 3. Pricing errors for put options. This figure depicts the percentage pricing errors for the cross-section of put options as afunction of time to maturity and moneyness for four different models: Black–Scholes (BS), canonical valuation (CV), shapeconstraint (SC), and Semi-parametric Scale Mixture of Betas (SSMB). Maturity is divided into three bins:540 days, (40, 704) days,and470 days. Moneyness is divided into six bins:50.94, (0.94, 0.97), (0.97, 1.00), (1.00, 1.03), (1.03, 1.06) and41.06.

Figure 4. Predictive distributions of market index prices. This figure depicts the predictive distributions of market index pricesbased on daily index data for four different time periods in the option data. Each graph includes the respective mean, standarddeviation, skewness and kurtosis of the distribution. All distribution functions have been obtained from simulated data using akernel smoothing approach.

10 M. Kacperczyk et al.

Page 11: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:28am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

poorly, but, consistent with previous findings, the para-metric model does well for long-term, in-the-moneycontracts.

Our model has substantial reduction in pricing errorsacross various moneyness/maturity classes for both calland put options. In many cases, the pricing errors are atleast 50% lower than those obtained using comparableparametric and non-parametric alternatives. It appearsthat the SSMB model shows promise for pricing options.

But like all statistical models, the SSMB model has itslimitations. Our model is basically providing full supporton unimodal densities. If the model is bi-modal ormultimodal, which is unlikely in our options pricingapplication, then SSMB would do badly. But in optionspricing, it is better to model outliers with heavy tails thanput another density at such points, which may happen ifan MDP is used. We also run into trouble if theparametric structure for the variance is wrong, but thenthis would be true for everyone using any semi-parametricmodel. In this paper, basically, we replace parametricunimodal densities with non-parametric unimodal densi-ties, picking up more possibilities via very high degrees ofkurtosis. That such high levels of kurtosis need to bebetter modelled is exemplified via the predictive densitiesfor both call and put options.

While the statistical estimation of our model offerssignificant improvements relative to competing alterna-tives, certain context-specific limitations apply. Forexample, we do not explicitly model the feedback fromthe level of returns to volatility, a so-called leverage effect,which is known to help pricing options. We also use avery parsimonious lag structure in the evolution of boththe mean and variance processes; adding other predictorscould produce more accurate moments estimates. Finally,we do not directly incorporate information from optionmarkets which some models consider useful. Theseextensions will be reported in subsequent research.

References

Ait-Sahalia, Y. and Duarte, J., Nonparametric option pricingunder shape restrictions. J. Econometr., 2003, 116, 9–47.

Ait-Sahalia, Y. and Lo, A.W., Nonparametric estimation ofstate-price densitites implicit in financial assets. J. Finan.,1998, 53, 499–547.

Bakshi, G., Cao, C. and Chen, Z., Empirical performance ofalternative option pricing models. J. Finan., 1997, 52,2003–2049.

Bates, D.S., Testing option pricing models. In Handbook ofStatistics, edited by G.S. Maddala and C.R. Rao, Vol. 14,1996 (Elsevier Science B.V: New York).

Bates, D.S., Post-’87 crash fears in the S&P 500 futures optionmarket. J. Econometr., 2000, 94, 181–238.

Black, F. and Scholes, M.S., The pricing of options andcorporate liabilities. J. Polit. Econ., 1973, 81, 637–659.

Bollerslev, T., Chou, R.Y. and Kroner, K.F., ARCH modelingin finance. J. Econometr., 1992, 52, 5–59.

Brunner, L.J. and Lo, A.Y., Bayes methods for a symmetric andunimodal density and its mode. Ann. Statist., 1989, 17,1550–1566.

Campbell, J.Y., Lo, A.W. and MacKinlay, C.A., TheEconometrics of Financial Markets, 1997 (PrincetonUniversity Press: Princeton, NJ).

Damien, P., Wakefield, J.C. and Walker, S.G., Gibbs samplingfor Bayesian nonconjugate and hierarchical models usingauxiliary variables. J. Roy. Statist. Soc. B, Statist. Methodol.,1999, 61, 331–344.

Derman, E. and Kani, I., Riding on the smile. Risk, 1994, 7,32–39.

Duffie, D., Pan, J. and Singleton, K., Transform analysis andasset pricing for affine jump-diffusions. Econometrica, 2000,68, 1343–1376.

Engle, R.F. and Gonzalez, G., Semi-parametric ARCH models.J. Bus. & Econ. Statist., 1991, 9, 345–359.

Eraker, B., Johannes, M. and Polson, N., The impact of jumpsin volatility and returns. J. Finan., 2003, 58, 1269–1300.

Escobar, M.D. and West, M., Bayesian density estimation andinference using mixtures. J. Amer. Statist. Assoc., 1995, 90,577–588.

Feller, W., An Introduction to Probability Theory and itsApplications, Vol. II, 1971 (Wiley: New York).

Ferguson, T.S., A Bayesian analysis of some nonparametricproblems. Ann. Statist., 1973, 1, 209–230.

Ghysels, E., Santa-Clara, P. and Valkanov, R.,Predicting volatility: How to get most out of returns datasampled at different frequencies. J. Econometr., 2006, 131,59–95.

Huang, C.F. and Litzenberger, R.H., Foundations for FinancialEconomics, 1988 (Prentice Hall: Englewood Cliffs, NJ).

MacEachern, S.N., Computational methods for mixture ofDirichlet process models. In Practical Nonparametric andSemiparametric Bayesian Statistics, edited by D. Deys,P. Muller, and D. Sinha, 1998 (Springer: New York).

Maheu, J.M. and McCurdy, T.H., News arrival, jumpsdynamics, and volatility components for individual stockreturns. J. Finan., 2004, 59, 755–793.

Merton, R.C., Option pricing when underlying stock returns arediscontinuous. J. Finan. Econ., 1976, 3, 125–144.

Mira, A., Moller, J. and Roberts, G.O., Perfect slice samplers.J. Roy. Statist. Soc. B, Statist. Methodol., 2001, 63, 593–606.

Rubinstein, M., Implied binomial trees. J. Finan., 1994, 69,771–818.

Smith, A.F. and Roberts, G.O., Bayesian computation via theGibbs sampler and related Markov chain Monte Carlomethods (with discussion). J. Roy. Statist. Soc. B, Statist.Methodol., 1993, 55, 3–24.

Stutzer, M., A simple nonparamteric approach to derivativesecurity valuation. J. Finan., 1996, 51, 1633–1652.

Appendix A: The Dirichlet Process (DP)

To motivate the DP, consider a simple example. SupposeX is a random variable which takes the value 1 withprobability p and the value 2 with probability 1� p.Uncertainty about the unknown distribution function F isequivalent to uncertainty about (p1, p2), where p1¼ p andp2¼ 1� p. A Bayesian would put a prior distribution overthe two unknown probabilities p1 and p2. Of course herewe essentially have only one unknown probability since p1and p2 must sum to one. A convenient prior distribution isthe Beta distribution given, up to proportionality, by

f ð p1, p2Þ / p�1�11 p�2�12 ,

where p1, p2, �1, �2 0, and p1þ p2¼ 1. It is denotedBeta(�1, �2). Different prior opinions can be expressed bydifferent choices of �1 and �2. Set �i¼ cqi with qi 0 andq1þ q2¼ 1. We have,

Eð piÞ ¼ qi

A new class of Bayesian semi-parametric models with applications to option pricing 11

Page 12: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:28am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

and

Varð piÞ ¼qið1� qiÞ

cþ 1: ðA1Þ

If qi¼ 0.5 and c¼ 2 we obtain a non-informative prior.

We denote our prior guess (q1, q2) by F0. The interpreta-

tion is that the qi centre the prior and c reflects our degree

of belief in the prior: a large value of c implies a smallvariance, and hence strong prior beliefs.

The Beta prior is convenient in the example above.

Why? Suppose we obtain a random sample of size n from

the distribution F. This is a binomial experiment with the

value X¼ 1 occurring n1 times (say) and the value X¼ 2

occurring n2 times, where n2¼ n�n1. The posterior

distribution of (p1, p2) is once again a Beta distribution

with parameters updated to Beta(�1þ n1, �2þ n2). Since

the posterior distribution belongs to the same family as

the prior distribution, namely the Beta distribution, such

a prior-to-posterior analysis is called a conjugate update,

with the prior being referred to as a conjugate prior. The

above example is the well-known Beta-Binomial model

for p.We now generalize the conjugate Beta-Binomial model

to the conjugate Multinomial-Dirichlet model. Now the

random variable X can take the value Xi with probability

pi, i¼ 1, . . . ,K, with pi 0, andPK

i¼1 pi ¼ 1. Now, uncer-

tainty about the unknown distribution function F is

equivalent to uncertainty about p¼ (p1, . . . , pK). The

conjugate prior distribution in this case is the Dirichlet

distribution (not to be confused with the Dirichlet process)

given, up to proportionality, by

f ð p1, . . . , pKÞ / p�1�11 � � � p�K�1K , ðA2Þ

where �i, pi 0 andPK

i¼1 pi ¼ 1. If we set �i¼ cqi then we

obtain the same interpretation of the prior, and, in

particular, the mean and variance are again given by

equation (A1). As before, (q1, . . . , qK) represents our prior

guess (F0) and c the certainty in this guess. A random

sample from F now constitutes a Multinomial experiment

and when this likelihood is combined with the Dirichlet

prior, the posterior distribution for p is once again a

Dirichlet distribution with parameters �iþ ni, where ni is

the number of observations in the ith category.We now make a jump from a discrete X to a continuous

X (by imagining K!1). In traditional parametric

Bayesian analysis, the distribution of X, say F, would be

assumed to belong to a particular family of continuous

probability density functions. For example, if X can take

on any real value the family of distributions is often

assumed to be the Normal distribution, denoted N(�, �2).A Bayesian analysis would then proceed by first placing a

prior distribution on � and �2, and then obtaining the

resultant posterior distributions of these two finite-

dimensional parameters.We enter the realm of Bayesian non-parametrics when

F (in the last paragraph) itself is treated as a random

variable; that is, one must now assign a prior distribution

to F. Since F is infinite-dimensional, we need a stochastic

process whose sample paths index the entire space of

distribution functions. In the main text in this paper, wenoted that our focus is on ensuring greater levels ofkurtosis in the conditional distribution of the asset. It isnow easy to see that by treating this conditional distri-bution, F, itself as a random quantity, we allow theprocess that is driving the data to take on any degree ofkurtosis. Parametric models impose a fixed degree ofkurtosis, including fat-tailed distributions like theStudent-t that is used in GARCH models. But theBayesian non-parametric model, loosely speaking,allows the data to determine the level of kurtosis in thedistribution of the asset, which could be much larger thanunder a parametric model. Indeed, since the focus of thispaper is on the predictive distribution of the asset,allowing for larger degrees of kurtosis is critical becauseuncertainty in forecasts increases over time. Thus, amodel that makes minimal assumptions about the distri-bution of the asset is likely to capture this greateruncertainty over time better.

We are now ready to define the Dirichlet process, butfirst some notation. A partition B1, . . . ,Bk of the samplespace � is such that

SKi¼1 Bi ¼ � and Bi\Bj¼; for all

i 6¼ j. That is, we have a group of sets that are disjoint andexhaustive. Stated differently, the sets cover the wholesample space and are non-overlapping.

Definition A.1: A Dirichlet process prior with parameter� generates random probability distributions F such that,for every k¼ 1, 2, 3, . . . and partition B1, . . . ,Bk of �, thedistribution of (F(B1), . . . ,F(Bk)) is the Dirichlet distribu-tion with parameter �(B1), . . . ,�(Bk)). Here � is a finitemeasure on � and so we can put �(�)¼ cF0(�), where c40and F0 is a probability distribution on �.

Example A.2: Consider a random variable X withdistribution function F defined on the real line. Nowconsider the probability p¼pr(X5x�) and suppose wespecify a DP prior with parameter � for F. If we putB1¼ (�1, x�] and B2¼ (x�, 1), then we see from theabove definition that, a priori, p has a Beta distributionwith parameters �1¼�(B1)¼ cF0(x

�) and �2¼ �(B2)¼c(1�F0(x

�)), where c¼ �(�1, 1). This prior is suchthat E(p)¼F0(x

�) and var(p)¼F0(x�)(1�F0(x

�))/(cþ 1),thus showing the link to equation (A1). The variance of pand hence the level of fluctuation of p about F0(x

�)depends on c. In particular, if c is large then we havestrong belief in F0(x

�) and var(p) is small. Note that this istrue for all partitions B1, B2 of the real line, or,equivalently, all values of a.

Now consider observations X1, . . . ,Xn from F. Let F beassigned a DP prior, denoted Dir(c, F0). Then Fergusonshowed that the posterior process F has parametersgiven by

cþ n andcFo þ nFn

cþ n,

where Fn is the empirical distribution function for thedata, namely the step function with jumps of 1/n at eachXi. The classical maximum likelihood estimator isgiven by Fn.

12 M. Kacperczyk et al.

Page 13: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:28am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

The posterior mean of F(A) for any set A2 (�1,1) isgiven by

EfFðAÞjdatag ¼ pnF0ðAÞ þ ð1� pnÞFnðAÞ,

where pn¼ c/(cþ n). Thus if c¼ 1 we have a weight of1/(nþ 1) on our prior F0 and n/(nþ 1) on the ‘data’ Fn. Asc increases, the posterior mean is influenced more by theprior, i.e. we have more faith in our prior choice, F0.

Appendix B: The Gibbs sampler

It is now standard practice in many MCMC algorithms touse what is known as a slice sampler; see, for example,Damien et al. (1999). The idea underlying the slicesampler is the following.

Suppose we wish to generate random variates from adensity f given by

f ðxÞ / �ðxÞl ðxÞ,

where � is a density of known type, and l is a non-negativeinvertible function (not necessarily a density). Note thatthis formulation automatically puts us in the domain ofBayesian modelling, where � is typically the priordistribution and l is the likelihood function. Since l isinvertible, this is akin to saying if l(x)4u, where u is anauxiliary random variable, then it is possible to constructthe set Au¼ {x : l(x)4u}. Damien et al. prove that it isthen possible to construct a Gibbs sampler to generaterandom variates from f, where all but one of theconditional distributions in the Gibbs sampler will beuniform distributions, and the remaining conditionaldistribution will be a truncated version of �. Theimplementation of the slice sampler hinges critically onintroducing the auxiliary variable u. The variable u itself islike a nuisance parameter. Its value lies in being able toconstruct the joint density f(x, u) so that sampling fromthe resulting conditional distributions, f(xju) and f(ujx),becomes easy. As demonstrated in Damien et al. (1999),note that one could embed a slice sampler within anoverall Gibbs sampling scheme. That is, any or all of theconditional distributions in a Gibbs sampler could bebroken into slice samplers. In the Gibbs sampler below,for the first conditional distribution, we use a slicesampler.

In the SSMB model, the following full conditionaldensities have to be sampled:

pðut j everything elseÞ, t ¼ 1, . . . ,T,

pðk j everything elseÞ, k ¼ 1, . . . ,K,

pða j everything elseÞ,

pðc j everything elseÞ,

pð� j everything elseÞ:

T equals the number of observed returns in the sampleand K is the number of independent variables in thevariance regression.

The Gibbs sampler successively samples from full con-ditional distributions described below. The parameters

which are sampled are {ut, a, c, 0, 1, Qt, �}. Here Qt is a

latent variable introduced in order to ease the computa-

tions in the Gibbs sampler while sampling from the

conditional distribution of ut. For clarity of exposition,

we show the sampling procedure for the model with one

explanatory variable in the variance regression. Also, to

simplify exposition, we initially set �¼ 0. At the very end,

we relax this setting and show that the following Gibbs

sampler would change very little as a result.

(1) p(utj. . .)

To sample this conditional distribution, we use a slice

sampling scheme, with Qt acting as the auxiliary or latent

variable. Define the density

f �ðuÞ ¼e�u=21ðA5 u5BÞR B

A e�u=2 u,

where

A ¼

½ð1þ aÞrt=ð2�tÞ�2 if rt 4 0

maxf½rt=ð2�t�tÞ�2 , ½ð1þ aÞrt=ð2a�tÞ�

2g if rt 5 0

8><>:

and

B ¼

½rt=ð2�t�tÞ�2 if rt 4 0

1 if rt 5 0:

8><>:

Here �t ¼ Q1=ða�1Þt � a=ð1þ aÞ. We take ut from f�(u) with

probability proportional to

2cðe�A=2 � e�B=2Þ=ð4ffiffiffippÞ

or take ut¼ uj, j 6¼ t and A5uj5B with probability

proportional to 1=ffiffiffiffiujp

.

(2) p(Qtj. . .)

We take Qt to be distributed uniformly from the

interval between 0 and

fa=ð1þ aÞ þ rt=ð2�tffiffiffiffiutpÞga�1:

(3) p(0j. . .)

Define

A ¼ maxrt40

logfð1þ aÞrt=ð2ffiffiffiffiutpÞg � 1z1t,

B ¼ maxrt50

logfð1þ aÞð�rtÞ=ð2affiffiffiffiutpÞg � 1z1t,

C ¼ maxrt50,�t50

logfrt=ð2�tffiffiffiffiutpÞg � 1z1t:

Also define

D ¼ minrt40,�t40

logfrt=ð2�tffiffiffiffiutpÞg � 1z1t:

Then we sample 0 from the prior restricted to the interval

ðmaxfA,B,C, g,DÞ:

A new class of Bayesian semi-parametric models with applications to option pricing 13

Page 14: A new class of Bayesian semi-parametric models with ...pages.stern.nyu.edu/~sternfin/mkacperc/public_html/option.pdftion in pricing errors suggests that jointly modelling the variance,

XML Template (2012) [11.8.2012–9:28am] [1–14]//blrnas3/cenpro/ApplicationFiles/Journals/TandF/3B2/RQUF/Vol00000/120184/APPFile/TF-RQUF120184.3d (RQUF) [INVALID Stage]

(4) p(1j. . .)Define

A ¼ maxrt40,z1t 4 0

½logfð1þ aÞrt=ð2ffiffiffiffiutpÞg � 0�=z1t,

B ¼ maxrt50,z1t 4 0

½logfð1þ aÞð�rtÞ=ð2affiffiffiffiutpÞg � 0�=z1t,

C ¼ maxrt40,z1t 5 0, �t 4 0

½logfrt=ð2�tffiffiffiffiutpÞg � 0�=z1t,

D ¼ maxrt50,z1t 4 0, �t 5 0

½logfrt=ð2�tffiffiffiffiutpÞg � 0�=z1t,

E ¼ minrt40,z1t 5 0

½logfð1þ aÞrt=ð2ffiffiffiffiutpÞg � 0�=z1t,

F ¼ minrt50,z1t 5 0

½logfð1þ aÞð�rtÞ=ð2affiffiffiffiutpÞg � 0�=z1t,

G ¼ minrt40,z1t 4 0, �t 4 0

½logfrt=ð2�tffiffiffiffiutpÞg � 0�=z1t,

H ¼ minrt50,z1t 5 0, �t 5 0

½logfrt=ð2�tffiffiffiffiutpÞg � 0�=z1t:

Then we sample from the prior for 1 restricted to theinterval

ðmaxfA,B,C,Dg, minfE,F,G,HgÞ:

(5) p(aj. . .)Define

A ¼ maxrt50,2�t

ffiffiffiutp

4 0ð�rtÞ=ð2�t

ffiffiffiffiutpþ rtÞ,

B ¼ minrt40ð2�t

ffiffiffiffiutpÞ=rt � 1,

C ¼ minrt50,2�t

ffiffiffiutp

5 0ð�rtÞ=ð2�t

ffiffiffiffiutpþ rtÞ:

Then we sample from the density proportional to �(a)�an

for a restricted to the interval

ðA, minfB,CgÞ,

where �(a) is the prior distribution of a, which, as wedetailed in the main text, is a Gamma(a0, b0) distribution.

(6) p(cj. . .)The sampling for c proceeds as follows. In the first step,

we sample from the beta distribution for the new latentparameter �2 (0, 1):

½� jc� � beta ðcþ 1, nÞ:

Then c is sampled from the mixture of gamma distribu-tions, where the weights are defined below:

½c j �, h� � ��Gammaða0 þ h, b0 � ln �Þ

þ ð1� ��ÞGammaða0 þ h� 1, b0 � ln �Þ:

Here Gamma(a0, b0) is the prior distribution for c, h is the

number of distinct values of the parameter u (usually

h5n), and �� is the solution of the equation

��=ð1� ��Þ ¼ ða0 þ h� 1Þ=nðb0 � lnð�ÞÞ:

Recall that the key output from the MCMC scheme is

to obtain the predictive distribution of the asset returns.

The conditional structure of the time series in equations

(12) and (13) in the main text provides us with the

intuition behind obtaining out-of-sample forecasts.

Clearly, there is no closed-form expression for the

predictive distribution; rather, we approximate it using

the sampled values from the Gibbs sampler. Here we

simply note that the procedure is remarkably easy to

implement because the predictive distribution for r is

constructed by merely sampling the latent variable ut;

sampling ut is straightforward as shown above. Hence, for

the prediction at period tþ 1, the following components

are sampled:

rtþ1 ¼ �nffiffiffiffiffiffiffiffiffiutþ1p

Btþ1ðaÞ, ðB1Þ

where �t is defined as

�t ¼ ezt ðB2Þ

and zt is the value of the covariate at the time of pricing

the option while utþ1 is sampled from the mixture of

Dirichlet process model. An rtþ1 can be obtained from

each iteration of the Gibbs sampler using the current

(c,, a). The above algorithm is extended until the

predicted value of rT is obtained from each iteration of

the Gibbs sampler.Now, if � 6¼ 0, as in equation (9), the Gibbs sampler is

easily modified as follows. First, for all other parameters,

where we previously had rt above, we would change it to

rt��. For sampling the conditional distribution �, givenall the other parameters, we note that, for each t,

ðrt � �Þ=�ffiffiffiffiutp

has density fW; see equation (11). Hence,

pð�jÞ / �ð�ÞYt

a

1þ aþrt � �

�ffiffiffiffiutp

� �

1�a

1þ a5

rt � �

�ffiffiffiffiutp 5

1

1þ a

� �,

where �(�) is the prior distribution for �.

14 M. Kacperczyk et al.