Option Valuation a First Course in Financial Mathematics(1)

K14090

Option Valuation: A First Course in Financial Mathematics provides a straightforward introduction to the mathematics and models used in the valuation of financial derivatives. It examines the principles of option pricing in detail via standard binomial and stochastic calculus models. Developing the requisite mathematical background as needed, the text introduces probability theory and stochastic calculus at an undergraduate level.

The first nine chapters of the book describe option valuation techniques in discrete time, focusing on the binomial model. The author shows how the binomial model offers a practical method for pricing options using relatively elementary mathematical tools. The binomial model also enables a clear, concrete exposition of fundamental principles of finance, such as arbitrage and hedging, without the distraction of complex mathematical constructs. The remaining chapters illustrate the theory in continuous time, with an emphasis on the more mathematically sophisticated Black–Scholes–Merton model.

Largely self-contained, this classroom-tested text offers a sound introduction to applied probability through a mathematical finance perspective. Numerous examples and exercises help readers gain expertise with financial calculus methods and increase their general mathematical sophistication. The exercises range from routine applications to spreadsheet projects to the pricing of a variety of complex financial instruments. Hints and solutions to odd-numbered problems are given in an appendix.

Finance/Mathematics

Option ValuationA First Course in

Financial Mathematics

Option Valuation

Option ValuationA First Course in Financial Mathematics

Hugo D. Junghenn

JunghennA First Course in

Financial Mathem

atics

K14090_Cover.indd 1 10/7/11 11:23 AM

www.ebook3000.com

http://www.ebook3000.org



www.ebook3000.com


CHAPMAN & HALL/CRC Financial Mathematics Series

Aims and scope: The field of financial mathematics forms an ever-expanding slice of the financial sector. This series aims to capture new developments and summarize what is known over the whole spectrum of this field. It will include a broad range of textbooks, reference works and handbooks that are meant to appeal to both academics and practitioners. The inclusion of numerical code and concrete real-world examples is highly encouraged.

Series EditorsM.A.H. DempsterCentre for Financial Research Department of Pure Mathematics and Statistics University of Cambridge

Dilip B. MadanRobert H. Smith School of Business University of Maryland

Rama ContCenter for Financial EngineeringColumbia UniversityNew York

Published TitlesAmerican-Style Derivatives; Valuation and Computation, Jerome Detemple

Analysis, Geometry, and Modeling in Finance: Advanced Methods in Option Pricing,

Pierre Henry-Labordère

Credit Risk: Models, Derivatives, and Management, Niklas Wagner

Engineering BGM, Alan Brace

Financial Modelling with Jump Processes, Rama Cont and Peter Tankov

Interest Rate Modeling: Theory and Practice, Lixin Wu

Introduction to Credit Risk Modeling, Second Edition, Christian Bluhm, Ludger Overbeck, and

Christoph Wagner

Introduction to Stochastic Calculus Applied to Finance, Second Edition,

Damien Lamberton and Bernard Lapeyre

Monte Carlo Methods and Models in Finance and Insurance, Ralf Korn, Elke Korn,

and Gerald Kroisandt

Numerical Methods for Finance, John A. D. Appleby, David C. Edelman, and John J. H. Miller

Option Valuation: A First Course in Financial Mathematics, Hugo D. Junghenn

Portfolio Optimization and Performance Analysis, Jean-Luc Prigent

Quantitative Fund Management, M. A. H. Dempster, Georg Pflug, and Gautam Mitra

Risk Analysis in Finance and Insurance, Second Edition, Alexander Melnikov

Robust Libor Modelling and Pricing of Derivative Products, John Schoenmakers

Stochastic Finance: A Numeraire Approach, Jan Vecer

Stochastic Financial Models, Douglas Kennedy

Structured Credit Portfolio Analysis, Baskets & CDOs, Christian Bluhm and Ludger Overbeck

Understanding Risk: The Theory and Practice of Financial Risk Management, David Murphy

Unravelling the Credit Crunch, David Murphy

Proposals for the series should be submitted to one of the series editors above or directly to:CRC Press, Taylor & Francis Group4th, Floor, Albert House1-4 Singer StreetLondon EC2A 4BQUK

www.ebook3000.com




Hugo D. Junghenn

www.ebook3000.com


CRC PressTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

© 2011 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government worksVersion Date: 20150312

International Standard Book Number-13: 978-1-4398-8912-1 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor-age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-vides licenses and registration for a variety of users. For organizations that have been granted a photo-copy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

and the CRC Press Web site athttp://www.crcpress.com

www.ebook3000.com


TO MY FAMILY

Mary, Katie, Patrick, Sadie

v

www.ebook3000.com


This page intentionally left blankThis page intentionally left blank

www.ebook3000.com


Contents

Preface xi

1 Interest and Present Value 1

1.1 Compound Interest . . . . . . . . . . . . . . . . . . . . . . . 11.2 Annuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Rate of Return . . . . . . . . . . . . . . . . . . . . . . . . . . 71.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Probability Spaces 13

2.1 Sample Spaces and Events . . . . . . . . . . . . . . . . . . . 132.2 Discrete Probability Spaces . . . . . . . . . . . . . . . . . . . 142.3 General Probability Spaces . . . . . . . . . . . . . . . . . . . 162.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . 202.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Random Variables 27

3.1 Denition and General Properties . . . . . . . . . . . . . . . 273.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . 293.3 Continuous Random Variables . . . . . . . . . . . . . . . . . 323.4 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . 343.5 Independent Random Variables . . . . . . . . . . . . . . . . 353.6 Sums of Independent Random Variables . . . . . . . . . . . . 383.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Options and Arbitrage 43

4.1 Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 Classication of Derivatives . . . . . . . . . . . . . . . . . . . 464.3 Forwards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4 Currency Forwards . . . . . . . . . . . . . . . . . . . . . . . 484.5 Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.6 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.7 Properties of Options . . . . . . . . . . . . . . . . . . . . . . 534.8 Dividend-Paying Stocks . . . . . . . . . . . . . . . . . . . . . 554.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

vii

www.ebook3000.com


viii

5 Discrete-Time Portfolio Processes 59

5.1 Discrete-Time Stochastic Processes. . . . . . . . . . . . . . . 595.2 Self-Financing Portfolios . . . . . . . . . . . . . . . . . . . . 615.3 Option Valuation by Portfolios . . . . . . . . . . . . . . . . . 645.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6 Expectation of a Random Variable 67

6.1 Discrete Case: Denition and Examples . . . . . . . . . . . . 676.2 Continuous Case: Denition and Examples . . . . . . . . . . 686.3 Properties of Expectation . . . . . . . . . . . . . . . . . . . . 696.4 Variance of a Random Variable . . . . . . . . . . . . . . . . . 716.5 The Central Limit Theorem . . . . . . . . . . . . . . . . . . 736.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7 The Binomial Model 77

7.1 Construction of the Binomial Model . . . . . . . . . . . . . . 777.2 Pricing a Claim in the Binomial Model . . . . . . . . . . . . 807.3 The Cox-Ross-Rubinstein Formula . . . . . . . . . . . . . . . 837.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8 Conditional Expectation and Discrete-Time Martingales 89

8.1 Denition of Conditional Expectation . . . . . . . . . . . . . 898.2 Examples of Conditional Expectation . . . . . . . . . . . . . 928.3 Properties of Conditional Expectation . . . . . . . . . . . . . 948.4 Discrete-Time Martingales . . . . . . . . . . . . . . . . . . . 968.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

9 The Binomial Model Revisited 101

9.1 Martingales in the Binomial Model . . . . . . . . . . . . . . 1019.2 Change of Probability . . . . . . . . . . . . . . . . . . . . . . 1039.3 American Claims in the Binomial Model . . . . . . . . . . . 1059.4 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . 1089.5 Optimal Exercise of an American Claim . . . . . . . . . . . . 1119.6 Dividends in the Binomial Model . . . . . . . . . . . . . . . 1149.7 The General Finite Market Model . . . . . . . . . . . . . . . 1159.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

10 Stochastic Calculus 119

10.1 Dierential Equations . . . . . . . . . . . . . . . . . . . . . . 11910.2 Continuous-Time Stochastic Processes . . . . . . . . . . . . . 12010.3 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . 12210.4 Variation of Brownian Paths . . . . . . . . . . . . . . . . . . 12310.5 Riemann-Stieltjes Integrals . . . . . . . . . . . . . . . . . . . 12610.6 Stochastic Integrals . . . . . . . . . . . . . . . . . . . . . . . 12610.7 The Ito-Doeblin Formula . . . . . . . . . . . . . . . . . . . . 13110.8 Stochastic Dierential Equations . . . . . . . . . . . . . . . . 136

www.ebook3000.com


ix

10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

11 The Black-Scholes-Merton Model 141

11.1 The Stock Price SDE . . . . . . . . . . . . . . . . . . . . . . 14111.2 Continuous-Time Portfolios . . . . . . . . . . . . . . . . . . . 14211.3 The Black-Scholes-Merton PDE . . . . . . . . . . . . . . . . 14311.4 Properties of the BSM Call Function . . . . . . . . . . . . . 14611.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

12 Continuous-Time Martingales 151

12.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . 15112.2 Martingales: Denition and Examples . . . . . . . . . . . . . 15212.3 Martingale Representation Theorem . . . . . . . . . . . . . . 15412.4 Moment Generating Functions . . . . . . . . . . . . . . . . . 15612.5 Change of Probability and Girsanov's Theorem . . . . . . . . 15812.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

13 The BSM Model Revisited 163

13.1 Risk-Neutral Valuation of a Derivative . . . . . . . . . . . . 16313.2 Proofs of the Valuation Formulas . . . . . . . . . . . . . . . 16513.3 Valuation under P . . . . . . . . . . . . . . . . . . . . . . . . 16713.4 The Feynman-Kac Representation Theorem . . . . . . . . . 16813.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

14 Other Options 173

14.1 Currency Options . . . . . . . . . . . . . . . . . . . . . . . . 17314.2 Forward Start Options . . . . . . . . . . . . . . . . . . . . . 17514.3 Chooser Options . . . . . . . . . . . . . . . . . . . . . . . . . 17614.4 Compound Options . . . . . . . . . . . . . . . . . . . . . . . 17714.5 Path-Dependent Derivatives . . . . . . . . . . . . . . . . . . 178

14.5.1 Barrier Options . . . . . . . . . . . . . . . . . . . . . . 17914.5.2 Lookback Options . . . . . . . . . . . . . . . . . . . . 18514.5.3 Asian Options . . . . . . . . . . . . . . . . . . . . . . 191

14.6 Quantos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19514.7 Options on Dividend-Paying Stocks . . . . . . . . . . . . . . 197

14.7.1 Continuous Dividend Stream . . . . . . . . . . . . . . 19714.7.2 Discrete Dividend Stream . . . . . . . . . . . . . . . . 198

14.8 American Claims in the BSM Model . . . . . . . . . . . . . . 20014.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

A Sets and Counting 209

B Solution of the BSM PDE 215

C Analytical Properties of the BSM Call Function 219

x

D Hints and Solutions to Odd-Numbered Problems 225

Bibliography 247

Index 249

xi

Preface

This text is intended as an introduction to the mathematics and modelsused in the valuation of nancial derivatives. It is designed for an audiencewith a background in standard multivariable calculus. Otherwise, the book isessentially self-contained: The requisite probability theory is developed fromrst principles and introduced as needed, and nance theory is explained indetail under the assumption that the reader has no background in the subject.

The book is an outgrowth of a set of notes developed for an undergraduatecourse in nancial mathematics oered at The George Washington University.The course serves mainly majors in mathematics, economics, or nance andis intended to provide a straightforward account of the principles of optionpricing. The primary goal of the text is to examine these principles in detail viathe standard binomial and stochastic calculus models. Of course, a rigorousexposition of such models requires a coherent development of the requisitemathematical background, and it is an equally important goal to provide thisbackground in a careful manner consistent with the scope of the text. Indeed,it is hoped that the text may serve as an introduction to applied probability(through the lens of mathematical nance).

The book consists of fourteen chapters, the rst nine of which developoption valuation techniques in discrete time, the last ve describing the the-ory in continuous time. The emphasis is on two models, the (discrete time)binomial model and the (continuous time) Black-Scholes-Merton model. Thebinomial model serves two purposes: First, it provides a practical way to priceoptions using relatively elementary mathematical tools. Second, it allows astraightforward and concrete exposition of fundamental principles of nance,such as arbitrage and hedging, without the possible distraction of complexmathematical constructs. Many of the ideas that arise in the binomial modelforeshadow notions inherent in the more mathematically sophisticated Black-Scholes-Merton model.

Chapter 1 gives an elementary account of present value. Here the focusis on risk-free investments, such money market accounts and bonds, whosevalues are determined by an interest rate. Investments of this type provide away to measure the value of a risky asset, such as a stock or commodity, andmathematical descriptions of such investments form an important componentof option pricing techniques.

Chapters 2, 3, and 6 form the core of the general probability portion ofthe text. The exposition is self-contained and uses only basic combinatoricsand elementary calculus. Appendix A provides a brief overview of the ele-mentary set theory and combinatorics used in these chapters. Readers with agood background in probability may safely give this part of the text a cursoryreading. While our approach is largely standard, the more sophisticated no-tions of event σ-eld and ltration are introduced early to prepare the reader

xii

for the martingale theory developed in later chapters. We have avoided us-ing Lebesgue integration by considering only discrete and continuous randomvariables.

Chapter 4 describes the most common types of nancial derivatives andemphasizes the role of arbitrage in nance theory. The assumption of anarbitrage-free market, that is, one that allows no free lunch, is crucial indeveloping useful pricing models. An important consequence of this assump-tion is the put-call parity formula, which relates the cost of a standard calloption to that of the corresponding put.

Discrete-time stochastic processes are introduced in Chapter 5 to providea rigorous mathematical framework for the notion of a self-nancing portfolio.The chapter describes how such portfolios may be used to replicate options inan arbitrage-free market.

Chapter 7 introduces the reader to the binomial model. The main result isthe construction of a replicating, self-nancing portfolio for a general Europeanclaim. The most important consequence is the Cox-Ross-Rubinstein formulafor the price of a call option. Chapter 9 considers the binomial model fromthe vantage point of discrete-time martingale theory, which is developed inChapter 8, and takes up the the more dicult problem of pricing and hedgingan American claim.

Chapter 10 gives an overview of Brownian motion, constructs the Ito in-tegral for processes with continuous paths, and uses Ito's formula to solvevarious stochastic dierential equations. Our approach to stochastic calculusbuilds on the reader's knowledge of classical calculus and emphasizes the sim-ilarities and dierences between the two theories via the notion of variationof a function.

Chapter 11 uses the tools developed in Chapter 10 to construct the Black-Scholes-Merton PDE, the solution of which leads to the celebrated Black-Scholes formula for the price of a call option. A detailed analysis of the an-alytical properties of the formula is given in the last section of the chapter.The more technical proofs are relegated to appendices so as not to interruptthe main ow of ideas.

Chapter 12 gives a brief overview of those aspects of continuous-time mar-tingales needed for risk-neutral pricing. The primary result is Girsanov's The-orem, which guarantees the existence of risk-neutral probability measures.

Chapters 13 and 14 provide a martingale approach to option pricing, usingrisk-neutral probability measures to nd the value of a variety of derivatives,including path-dependent options. Rather than being encyclopedic, the ma-terial is intended to convey the essential ideas of derivative pricing and todemonstrate the utility and elegance of martingale techniques in this endeavor.

The text contains numerous examples and 200 exercises designed to helpthe reader gain expertise in the methods of nancial calculus and, not inci-dentally, to increase his or her level of general mathematical sophistication.The exercises range from routine calculations to spreadsheet projects to the

xiii

pricing of a variety of complex nancial instruments. Hints and solutions tothe odd-numbered problems are given in Appendix D.

For greater clarity and ease of exposition (and to remain within the in-tended scope of the text), we have avoided stating results in their most generalform. Thus, interest rates are assumed to be constant, paths of stochastic pro-cesses are required to be continuous, and nancial markets trade in a singlerisky asset. While these assumptions may be unrealistic, it is our belief thatthe reader who has obtained a solid understanding of the theory in this simpli-ed setting will have little diculty in making the transition to more generalcontexts.

While the text contains numerous examples and problems involving theuse of spreadsheets, we have not included any discussion of general numericaltechniques, as there are several excellent texts devoted to this subject. Indeed,such a text could be used to good eect in conjunction with the present one.

It is inevitable that any serious development of option pricing methods atthe intended level of this book must occasionally resort to invoking a resultthat falls outside the scope of the text. For the few times that this has oc-curred, we have tried either to give a sketch of the proof or, failing that, togive references, general or specic, where the reader may nd a reasonablyaccessible proof.

The text is organized to allow as exible use as possible. The precursorto the book, in the form of a set of notes, has been successfully tested in theclassroom as a single semester course in discrete-time theory only (Chapters19) and as a one-semester course giving an overview of both discrete-time andcontinuous-time models (Chapters 17, 10, and 11). It may also easily serveas a two-semester course, with Chapters 113 forming the core and selectionsfrom Chapter 14.

To the students whose sharp eye caught typos, inconsistencies, and down-right errors in the notes leading up to the book: thank you. To the readers ofthis text: the author would be grateful indeed for similar observations, shouldthe opportunity arise, as well as for suggestions for improvements.

Hugo D. JunghennWashington, D.C., USA


Chapter 1

Interest and Present Value

In this chapter, we consider assets whose value is determined by an interestrate. If the asset is guaranteed, as in the case of an insured savings account or agovernment bond (which, typically, has only a small likelihood of default), theasset is said to be risk-free. Such an asset stands in contrast to a risky asset,for example, a stock or commodity, whose future values cannot be determinedwith certainty. As we shall see in later chapters, mathematical models thatdescribe the value of a risky asset typically include a component involving arisk-free asset. Therefore, our rst goal is to describe how risk-free assets arevalued.

1.1 Compound Interest

Interest is a fee paid by one party for the use of cash assets of another.The amount of interest is generally time dependent: the longer the outstandingbalance, the more interest is accrued. A familiar example is the interest gen-erated by a money market account. The bank pays the depositor an amountthat is usually a fraction of the balance in the account, that fraction given interms of a prorated annual percentage called the nominal rate.

Consider rst an account that pays interest at the discrete times n =1, 2, . . .. Suppose the initial deposit is A0 and the interest rate per periodis i. If interest is compounded , then, after the rst period, the value of theaccount is A1 = A0 + iA0 = A0(1 + i), after the second period the value isA2 = A1 + iA1 = A1(1 + i) = A0(1 + i)2, and so on. In general, the value ofthe account at time n is

An = A0(1 + i)n, n = 0, 1, 2, . . . . (1.1)

A0 is called the present value or discounted value of the account and An afuture value.

Now suppose that the nominal rate is r and interest is compounded mtimes a year. Then i = r/m hence the value of the account after t years is

At = A0(1 + r/m)mt. (1.2)

1

2 Option Valuation: A First Course in Financial Mathematics

The distinction between the formulas (1.1) and (1.2) is that the former ex-presses the value of the account as a function of the number of compoundingintervals (that is, at the discrete times n), while the latter gives the value asa function of continuous time t (in years).

In contrast to an account earning compound interest, an account drawingsimple interest has time-t value

At = A0(1 + tr).

In this case, interest is calculated only on the initial deposit A0 and not onthe preceding account value.

Example 1.1.1. Table 1.2 gives the value after two years of an account withpresent value $800. The account is assumed to earn interest at an annual rateof 12%.

Value Compound Method

800(1.12)2 = $1,003.52 annually800(1.06)4 = $1,009.98 semiannually800(1.03)8 = $1,013.42 quarterly800(1.01)24 = $1,015.79 monthly800(1.0003)730 = $1,016.96 daily

TABLE 1.1: Account Value in Two Years

Note that for simple interest the value of the account after two years is800(1.24) = $992.00.

The above example suggests that compounding more frequently results ina greater return. This is can be seen from the fact that the sequence (1+r/m)m

is increasing in m. To see what happens when m → ∞, set x = m/r in (1.2)so that

At = A0 [(1 + 1/x)x]rt.

As m→∞, l'Hospital's rule shows that (1+1/x)x → e. In this way, we arriveat the formula for continuously compounded interest :

At = A0ert. (1.3)

Returning to Example 1.1.1 we see that, if interest is compounded continu-ously, then the value of the account after two years is 800e(.12)2 = $1, 016.99,not signicantly more than for daily compounding.

The eective interest rate re is the simple interest rate that produces the

Interest and Present Value 3

same yield in one year as compound interest. If interest is compounded mtimes a year, this means that A0(1 + r/m)m = A0(1 + re) hence

re = (1 + r/m)m − 1.

If interest is compounded continuously, then A0er = A0(1 + re) so that

re = er − 1.

Example 1.1.2. You just inherited $10,000, which you decide to deposit inone of three banks, A, B, or C. Bank A pays 11% compounded semiannu-ally, bank B pays 10.76% compounded monthly, and bank C pays 10.72 %compounded continuously. Which bank should you choose?

Solution: We compute the eective rate re for each given interest rate.Rounding, we have

re = (1 + .11/2)2 − 1 = 0.113025 for Bank A,re = (1 + .1076/12)12 − 1 = 0.113068 for Bank B,re = e.1072 − 1 = 0.113156 for Bank C.

Bank C has the highest eective rate and is therefore the best choice.

1.2 Annuities

An annuity is a sequence of periodic payments of a xed amount, say, P .The payments may take the form of deposits into an account, such as a pensionfund or layaway plan, or withdrawals from an account, for example, a trustfund or retirement account.1 Suppose that the account pays interest at anannual rate r compounded m times per year and that a deposit (withdrawal)is made at the end of each compounding interval. We seek the value An of theaccount at time n, that is, immediately after the nth payment.

In the case of deposits, An is the sum of the time-n values of payments 1through n. Since payment j accrues interest over n − j payment periods, itstime-n value is P (1 + r/m)n−j . Thus,

An = P (1 + x+ x2 + · · ·+ xn−1), x := 1 +r

m.

The geometric series sums to (xn − 1)/(x− 1), hence

An = P(1 + i)n − 1

i, i :=

r

m. (1.4)

1An account into which periodic deposits are made for the purpose of retiring a debt orpurchasing an asset is sometimes called a sinking fund.


For withdrawals we argue as follows: Let A0 be the initial value of theaccount. The value at the end of period n, just before withdrawal of thenth payment, is An−1 plus the interest iAn−1 over that period. Making thewithdrawal reduces that value by P so

An = aAn−1 − P, a := 1 + i.

Iterating, we obtain

An = a2An−2 − (1 + a)P = · · · = anA0 − (1 + a+ a2 + · · ·+ an−1)P.

Thus,

An = (1 + i)nA0 + P1− (1 + i)n

i

=(1 + i)n(iA0 − P ) + P

i. (1.5)

Now assume that the account is drawn down to zero after N withdrawals.Setting n = N and AN = 0 in (1.5) and solving for A0 yields

A0 = P1− (1 + i)−N

i. (1.6)

This is the initial deposit required to support exactlyN withdrawals of amountP from, say, a retirement account or trust fund. It may be seen as the sum ofthe present values of the N withdrawals.

Solving for P in (1.6) we obtain

P = A0i

1− (1 + i)−N, (1.7)

which may be used, for example, to calculate the mortgage payment for amortgage of size A0 (see Example 1.2.2, below). Substituting (1.7) into (1.5)we obtain the following formula for the time-n value of an annuity supportingexactly N withdrawals:

An = A01− (1 + i)n−N

1− (1 + i)−N, n = 0, 1, . . . , N. (1.8)

Example 1.2.1. (Retirement plan). Suppose you make monthly deposits ofsize P into a retirement account with an annual rate r, compounded monthly.After t years you wish to make monthly withdrawals of size Q from the accountfor s years, drawing down the account to zero. By (1.4) and (1.6) it must thenbe the case that

P(1 + i)12t − 1

i= Q

1− (1 + i)−12s

i, i :=

r

12,


orP

Q=

1− (1 + i)−12s

(1 + i)12t − 1. (1.9)

For a numerical example, suppose that t = 40, s = 30, and r = .06. Then

P

Q=

1− (1.005)−360

(1.005)480 − 1≈ .084,

so that a withdrawal of, say, Q = $5000 during retirement would requiremonthly deposits of

P = (.084)5000 ≈ $419.

A more realistic analysis takes into account the reduction of purchasing powerdue to ination. Suppose that ination is running at 3% per year or .25%per month. This means that goods and services that cost $1 now will cost$(1.0025)n n months from now. The present value purchasing power of therst withdrawal is then

5000(1.0025)−481 ≈ $1504,

while that of the last withdrawal is only

5000(1.0025)−840 ≈ $614.

For the rst withdrawal to have the current purchasing power of $5000, Qwould have to be

5000(1.0025)481 ≈ $16, 617,

which would require monthly deposits of

P = (.084)16, 617 ≈ $1396.

For the last withdrawal to have the current purchasing power of $5000, Qwould have to be

5000(1.0025)840 ≈ $40, 724,

requiring monthly deposits of

P = (.084)40, 724 ≈ $3421,

more than eight times the amount calculated without considering ination!

Example 1.2.2. (Amortization). Suppose you take out a 20-year, $200,000mortgage at an annual rate of 8% compounded monthly. Your monthly mort-gage payments P constitute an annuity with A0 = $200, 000, i = .08/12 =.0067, and N = 240. Here An is the amount owed at the end of month n. By(1.7), the mortgage payments are

P = 200, 000.0067

1− (1.0067)−240= $1677.85.


Now let In and Pn denote, respectively, the portions of the nth paymentthat are interest and principle. Since An−1 was owed at the end of monthn− 1, (1.8) shows that

In = iAn−1 = iA01− (1 + i)n−1−N

1− (1 + i)−N,

and therefore

Pn = P − In = iA0(1 + i)n−1−N

1− (1 + i)−N. (1.10)

In particular, from (1.10) we have P1 = $337.86 and P240 = $1666.70. Thusonly about 20% of the rst payment goes to reducing the principle, whilealmost 100% of the last payment does so.

The sequences (Pn), (In), and (An) form the basis of what is called theamortization schedule of the mortgage.

In the above annuity formulas, the compounding interval and the paymentinterval are the same, and payment is made at the end of the compoundinginterval, describing what is called an ordinary annuity. If payment is made atthe beginning of the period, as is the case for, say, rents and insurance, oneobtains an annuity due, and the formulas change accordingly.

1.3 Bonds

Bonds are nancial contracts issued by governments, corporations, andother institutions. The simplest type of bond is the zero coupon bond. U.S.Treasury bills and U.S. savings bonds are common examples. The purchaserof a bond pays an amount B0 (which may be determined by bids) and receivesa prescribed amount F , the face value of the bond, at a prescribed time T , thematurity date. The value Bt of the bond at time t may be expressed in termsof a continuously compounded interest rate r determined by the equation

B0 = Fe−rT .

Bt is then the face value of the bond discounted to time t:

Bt = Fe−r(T−t) = B0ert, 0 ≤ t ≤ T.

Thus, during the time interval [0, T ], the bond acts like a money marketaccount with continuously compounded interest. The time restriction may betheoretically removed as follows: At time T , reinvest the proceeds F fromthe bond by buying F/B0 bonds, each for the amount B0 and each with the


face value F and maturity date 2T . At time t ∈ [T, 2T ] each bond has valueFe−r(2T−t) = B0e

−rT ert, so the bond account has value

Bt = (F/B0)B0e−rT ert = Fe−rT ert = B0e

rt, (T ≤ t ≤ 2T ).

Continuing this process we see that the formula Bt = B0ert holds for all times

t ≥ 0 over which the rate r, determined by the face value of the bond and thebid, is constant.

With a coupon bond, one receives not only the amount F at time T butalso a sequence of payments during the life of the bond. Thus, at prescribedtimes t1 < t2 < · · · < tN , the bond pays an amount Cn, called a coupon, andat maturity T one receives the face value F . The price of the bond is the totalpresent value

B0 =

N∑n=1

e−rtnCn + Fe−rT . (1.11)

Note that this is the initial value of a portfolio consisting of N+1 zero-couponbonds maturing at times t1, t2, . . ., tN , and T .

1.4 Rate of Return

Consider an investment that returns, for an initial payment of P > 0, anamount An > 0 at the end of period n, n = 1, 2, . . . , N . The rate of returnof the investment is dened to be that periodic interest rate i for which thepresent value of the sequence of returns equals the initial payment P , that is,

P =

N∑n=1

An(1 + i)−n. (1.12)

Examples of such investments are annuities and coupon bonds. For a couponbond that pays the amount C at each of the times n = 1, 2, . . . , N − 1 andpays the face value F at time N , Equation (1.12) reduces to

P = C1− (1 + i)−N

i+ F (1 + i)−N ,

where P = B0 is the price of the bond.To see that Equation (1.12) has a unique solution i > −1, denote the right

side by f(i) and note that f is continuous on the interval (−1,∞) and satises

limi→∞

f(i) = 0 and limi→−1+

f(i) =∞.

Since P > 0, the Intermediate Value Theorem implies that the equation f(i) =P has a solution i > −1. Because f is strictly decreasing, the solution is unique.


A rate of return i may be positive, zero, or negative. If f(0) > P , that is,the sum of the payos is greater than the initial investment, then, because fis decreasing, i > 0. Similarly, if f(0) < P , that is, the sum of the payos isless than the initial investment, then i < 0.

Example 1.4.1. Suppose you loan a friend $100 and he agrees to pay you$35 at the end of the rst year, $37 at the end of the second year, and $39 atthe end of the third year, at which time the loan is considered to be paid o.The sum of the payos is greater than 100, so the equation

35

(1 + i)+

37

(1 + i)2+

39

(1 + i)3= 100

has a unique positive solution i. One can use Newton's method to determinei, or one can simply solve the equation by trial and error using a spreadsheet.The latter approach gives i ≈ 0.053, that is, an annual rate of about 5.3%.


1.5 Exercises

1. Suppose you deposit $1500 in an account paying an annual rate of 6%.Find the value of the account in three years if interest is compounded(a) yearly; (b) quarterly; (c) monthly; (d) daily; (e) continuously.

2. What annual interest rate r would allow you to double your initial de-posit in 6 years if interest is compounded quarterly? Continuously?

3. Find the eective interest rate if a nominal rate of 12% is compounded(a) quarterly; (b) monthly; (c) continuously.

4. If you receive 6% interest compounded monthly, about how many yearswill it take for your investment to triple?

5. If you deposit $400 at the end of each month into an account earning8% interest compounded monthly, what is the value of the account atthe end of 5 years? 10 years?

6. You deposit $700 at the end of each month into an account earninginterest at an annual rate of r compounded monthly. Use a spreadsheetto nd the value of r that produces an account value of $50,000 in 5years.

7. You deposit $400 at the end of each month into an account with anannual rate of 6% compounded monthly. Use a spreadsheet to determinethe minimum number of payments required for the account to have avalue of at least $30,000.

8. Suppose an account oers continuously compounded interest at an an-nual rate r and that a deposit of size P is made at the end of eachmonth. Show that the value of the account after n deposits is

An = Pern/12 − 1

er/12 − 1.

9. You make an initial deposit of $200,000 into an account paying 6%compounded monthly. If you withdraw $2000 each month, how muchwill be left in the account after 5 years? 10 years? When will the accountbe drawn down to zero?

10. An account pays an annual rate of 8% percent compounded monthly.What lump sum must you deposit into the account now so that in 10years you can begin to withdraw $4000 each month for the next 20 years,drawing down the account to zero?


11. A trust fund has an initial value of $300,000 and earns interest at anannual rate of 6%, compounded monthly. If a withdrawal of $5000 ismade at the end of each month, when will the account will fall below$150,000? (Use a spreadsheet.)

12. Referring to Equation (1.5), nd the smallest value of A0 in terms of Pand i that will fund a perpetual annuity, that is, an annuity for whichAn > 0 for all n. What is the value of An in this case?

13. Suppose that an account oers continuously compounded interest at anannual rate r and that withdrawals of size P are made at the end ofeach month. If the initial deposit is A0 and the account is drawn downto zero after N withdrawals, show that the value of the account after nwithdrawals is

An = P1− e−r(N−n)/12

er/12 − 1.

14. In Example 1.2.1, suppose that t = 30, s = 20, and r = .12. Find thepayment amount P for withdrawals Q of $3000 per month. If inationis running at 2% per year, what value of P will give the rst withdrawalthe current purchasing power of $3000? The last withdrawal?

15. For a 30-year, $300,000 mortgage, determine the annual rate r you willhave to lock in to have payments of $1800 per month?

16. In Example 1.2.2, suppose that you must pay an inspection fee of $1000,a loan initiation fee of $1000, and 2 points, that is, 2% of the nominalloan of $200,000. Eectively, then, you are receiving only $194,000 fromthe lending institution. Calculate the annual interest rate r′ you willnow be paying, given the agreed upon monthly payments of $1667.85.

17. How large a loan can you take out at an annual rate of 15% if you canaord to pay back $1000 at the end of each month and you want toretire the loan after 5 years?

18. Suppose you take out a 20-year, $300,000 mortgage at 7% and decideafter 15 years to pay o the mortgage. How much will you have to pay?

19. You can retire a loan either by paying o the entire amount $8000 now,or by paying $6000 now and $6000 at the end of 10 years. Find a cutovalue r0 such that if the nominal rate r is < r0, then you should pay othe entire loan now, but if r > r0, then it is preferable to wait. Assumethat interest is compounded continuously.

20. You can retire a loan either by paying o the entire amount $8000 now,or by paying $6000 now, $2000 at the end of 5 years, and an additional$2000 at the end of 10 years. Find a cuto value r0 such that if thenominal rate r is < r0, then you should pay o the entire loan now,


but if r > r0, then it is preferable to wait. Assume that interest iscompounded continuously.

21. Suppose you take out a 30-year, $100,000 mortgage at 6%. After 10years, interest rates go down to 4%, so you decide to renance the re-mainder of the loan by taking out a new 20-year mortgage. If the costof renancing is 3 points (3% of the new mortgage amount), what arethe new payments? What threshold interest rate would make renanc-ing scally unwise? (Assume that the points are rolled in with the newmortgage.)

22. Referring to Example 1.2.2, show that

Pn = (1 + i)n−1P1, and In =1− (1 + i)n−N−1

1− (1 + i)−NI1.

23. Referring to Section 1.3, nd the time-t value Bt of a coupon bond fortm ≤ t < tm+1, m = 0, 1, 2, . . . , N − 1, where t0 = 0.

24. Find the rate of return of a 4-year investment that, for an initial invest-ment of $1000, returns $100, $200, and $300 at the end of years 1, 2,and 3, respectively, and, at the end of year 4, returns (a) $350; (b) $400;(c) $550. What would the rates be if the rate of return formula is basedon continuously compounded interest?

25. Table 1.2 gives the end of year returns for two investment plans basedon an initial investment of $10,000. Determine which plan is best.

Year 1 Year 2 Year 3 Year 4

Plan A $3000 $5000 $7000 $1000Plan B $3500 $4500 $6500 $1500

TABLE 1.2: End of Year Returns

26. In Exercise 25, what is the smallest return in year 1 of Plan A that wouldmake Plans A and B equally lucrative? Answer the same question foryear 4.


Chapter 2

Probability Spaces

Because nancial markets are sensitive to a variety of unpredictable events,the value of a nancial asset, such as a stock or commodity, is usually inter-preted as a random quantity subject to the laws of probability. In this chapter,we develop the probability theory needed to model the dynamic behavior ofasset prices. We assume that the reader is familiar with the notation and ter-minology of elementary set theory as well as basic combinatorial principles. Areview of these concepts may be found in Appendix A.

2.1 Sample Spaces and Events

A probability is a number that expresses the likelihood of occurrence of anevent in an experiment. The experiment can be something as simple as thetoss of a coin or as complex as the observation of stock prices over time. Forour purposes, we shall consider an experiment to be any activity that pro-duces observable outcomes. For example, tossing a die and noting the numberof dots appearing on the top face is an experiment whose outcomes are theintegers 1 through 6. Observing the average value of a stock over the previousweek or noting the rst time the stock dips below a prescribed level are ex-periments whose outcomes are nonnegative real numbers. Throwing a dart isan experiment whose outcomes may be taken as the coordinates of the dart'slanding position.

The collection of all outcomes of an experiment is called the sample spaceof the experiment. In probability theory, one starts with an assignment ofprobabilities to subsets of the sample space called events. This assignmentmust satisfy certain axioms and can be quite technical, depending on thesample space and the nature of the events. We begin with the simplest setting,that of a discrete probability space.

13


2.2 Discrete Probability Spaces

Consider an experiment whose outcomes may be represented by a nite orinnite sequence, say, ω1, ω2, . . .. Let pn denote the probability of outcome ωn.In practice, the determination of pn may be based on relative frequency, logicaldeduction, or analytical methods and may be approximate or theoretical. Forexample, suppose we take a poll of 1000 people in a certain locality anddiscover that 200 prefer candidate A and 800 candidate B. If we choose aperson at random from the sample, then it is natural to assign a theoreticalprobability of .2 to the outcome that the person chosen prefers candidateA. If, however, the person is chosen randomly from the entire locality, thenpollsters take the probability of the same outcome to be only approximately.2. Similarly, if we ip a coin 10,000 times and nd that exactly 5143 headsappear, we might assign the approximate probability of .5134 to the outcomethat a single toss produces a head. On the other hand, for the idealized coin,we would assign that same outcome a theoretical probability of .5.

However the probabilities pn are determined, they must satisfy the follow-ing properties:

(a) 0 ≤ pn ≤ 1 for every n, and

(b)∑n pn = 1.

The nite or innite sequence (p1, p2, . . .) is then called the probability vectorfor the experiment. The probability P(A) of a subset A of the sample spaceΩ = ω1, ω2, . . . is dened as

P(A) =∑ωn∈A

pn. (2.1)

The sum in (2.1) is either nite or a convergent innite series. If A = ∅, thenthe sum is interpreted as having the value zero. The function P is called aprobability measure for the experiment, and the pair (Ω,P) is said to be adiscrete probability space.

The following proposition summarizes the basic properties of P. We omitthe proof.

Proposition 2.2.1. (i) 0 ≤ P(A) ≤ 1; (ii) P(∅) = 0 and P(Ω) = 1; (iii) if(An) is a nite or innite sequence of pairwise disjoint subsets of Ω, then

P

(⋃n

An

)=∑n

P (An) .

Part (iii) of the proposition is called the additivity property of P. As weshall see, it is precisely the property needed to justify taking limits of randomquantities.

Probability Spaces 15

Example 2.2.2. There are 10 slips of paper in a hat, two of which are labeledwith the number 1, three with the number 2, and ve with the number 3. Aslip is drawn at random from the hat, the label is noted, the slip is returned,and the process is repeated a second time. The sample space of the experimentmay be taken to be the set of all ordered pairs (j, k), where j is the numberon the rst slip and k the number on the second. The event A that the sum ofthe numbers on the slips equals 4 consists of the outcomes (1, 3), (3, 1), and(2, 2). By relative frequency arguments, the probabilities of these outcomesare .1, .1, and .09, respectively, hence P(A) = .29.

Example 2.2.3. Toss a fair coin innitely often (conceptually, but not practi-cally, possible). This produces an innite sequence of heads H and tails T . Ourexperiment consists of observing the rst time an H occurs. The sample spaceis then Ω = 0, 1, 2, 3, . . ., where, for example, the outcome 2 means that therst toss comes up T and the second H, while outcome 0 means that H neverappears. To nd the probability vector (p0, p1, . . .) for the experiment, we ar-gue as follows: Since on the rst toss the outcomes H or T are equally likely,we should set p1 = 1/2. Similarly, the outcomes HH, HT , TH, TT of the rsttwo tosses are equally likely hence p2, the probability that TH occurs, shouldbe 1/4. In general, we see that we should set pn = 2−n, n ≥ 1. By additivity,the probability that a head eventually comes up is

∑∞n=1 pn =

∑∞n=1 2−n = 1,

from which it follows that p0 = 0. The probability vector for the experimentis therefore

(0, 2−1, 2−2, . . .

).

In the important special case where the sample space Ω is nite and eachoutcome is equally likely, pn = 1/|Ω| and (2.1) reduces to

P(A) =|A||Ω| ,

where | · | denotes the number of elements in a nite set. The determinationof probabilities in this case is then purely a combinatorial problem.

Example 2.2.4. The total number of poker hands is(

525

)= 2, 598, 960. We

show that three of a kind (for example, three Jacks, 5, 7) beats two pair.By the multiplication principle (Appendix A), the number of poker hands

with three of a kind is

13 · 4 ·(

48 · 44

2

)= 54, 912,

corresponding to the process of choosing a denomination for the triple, select-ing three cards from that denomination, and then choosing the remaining twocards, avoiding the selected denomination as well as pairs. The probability ofgetting a hand with three of a kind is therefore

54, 912

2, 598, 960≈ .02113.


Similarly, the number of hands with two (distinct) pairs is(13

2

)·(

4

2

)2

· 44 = 123, 552,

corresponding to the process of choosing denominations for the pairs, choosingtwo cards from each of the denominations, and then choosing the remainingcard, avoiding the selected denominations. The probability of getting a handwith two pairs is therefore

123, 552

2, 598, 960≈ .04754,

more than twice that of getting three of a kind.

2.3 General Probability Spaces

For a discrete probability space, we were able to assign a probability toeach set of outcomes, that is, to each subset of the sample space Ω. In generalprobability spaces this may not be possible, and we must conne our assign-ment of probabilities to a suitably restricted collection of subsets of Ω. Tohave a useful and robust theory, we require that the collection form a σ-eld,dened as follows.

Denition 2.3.1. A collection F of subsets of a set Ω is said to be a σ-eldif

(a) ∅, Ω ∈ F ;(b) A ∈ F ⇒ A′ ∈ F ; and

(c) for any nite or innite sequence of members An of F ,⋃n

An ∈ F .

If Ω is the sample space of an experiment, then F is called an event σ-eldfor the experiment and members of F are called events.

Property (a) of Denition 2.3.1 asserts that the sure event Ω and theimpossible event ∅ are always members of F . Property (c) asserts that F isclosed under countable unions. By virtue of (b), (c), and De Morgan's law

⋂n

An =

(⋃n

A′n

)′,

F is also closed under countable intersections.The trivial collection ∅,Ω and the collection of all subsets of Ω are ex-

amples of σ-elds. The following examples are more interesting.


Example 2.3.2. Let Ω be a nite, nonempty set and P a partition of Ω,that is, a collection of pairwise disjoint, nonempty sets with union Ω. Thecollection consisting of ∅ and all possible unions of members of P is a σ-eld.To illustrate property (b), suppose, for example, that P = A1, A2, A3, A4.The complement of A1 ∪A3 is then A2 ∪A4.

Example 2.3.3. Let A be any collection of subsets of Ω and let Fλ : λ ∈ Λdenote the collection of all σ-elds containing A. The intersection FA of the σ-elds Fλ is again a σ-eld, called the σ-eld generated by A. It is the smallestσ-eld containing the members of A. If Ω is nite and A is a partition of Ω,then FA is the σ-eld of Example 2.3.2. If Ω is an interval of real numbersand A is the collection of all subintervals of Ω, then FA is called the Borelσ-eld of Ω and its members the Borel sets of Ω.

An event σ-eld F may be thought of as representing the available infor-mation in an experiment, information that is known only after an outcomeof the experiment has been observed. For example, if we are contemplatingbuying a stock at time t, then the information available to us (barring insiderinformation) is the price history of the stock up to time t. We show later thatthis information may be conveniently described by a σ-eld Ft.

Once a sample space Ω and an event σ-eld have been specied the nextstep is to assign probabilities. This is done in accordance with the followingaxioms. (Compare with Proposition 2.2.1.)

Denition 2.3.4. Let Ω be a sample space and F an event σ-eld. A proba-bility measure for (Ω,F), or a probability law for the experiment, is a functionP which assigns to each event A ∈ F a number P(A), called the probability ofA, such that the following properties hold:

(a) 0 ≤ P(A) ≤ 1;

(b) P(Ω) = 1 and P(∅) = 0;

(c) if A1, A2, . . . is a nite or innite sequence of pairwise disjoint events,then

P

(⋃n

An

)=∑n

P(An).

The triple (Ω,F ,P) is then called a probability space.

A collection of events is said to be mutually exclusive if P(AB) = 0 foreach pair of distinct members A and B in the collection. Pairwise disjointsets are obviously mutually exclusive, but not conversely. It is clear that theadditivity axiom (c) holds for mutually exclusive events as well.

Proposition 2.3.5. A probability measure P has the following properties:

(i) P(A ∪B) = P(A) + P(B)− P(AB);


(ii) if B ⊆ A, then P(A−B) = P(A)− P(B); in particular P(B) ≤ P(A);

(iii) P(A′) = 1− P(A).

Proof. For (i), note that A ∪ B is the union of the pairwise disjoint eventsAB′, AB, and BA′. Therefore, by additivity,

P(A ∪B) = P(AB′) + P(AB) + P(BA′).

Similarly,

P(A) = P(AB′) + P(AB) and P(B) = P(BA′) + P(AB).

Subtracting the last two equations from the rst gives property (i). Property(ii) follows easily from additivity, as does (iii), using P(Ω) = 1.

Part (i) of Proposition 2.3.5 is a special case of the inclusion-exclusionrule. In Exercise 2, we consider versions for three and four events.

By Proposition 2.2.1, a discrete probability space is a probability spacein the general sense with event σ-eld consisting of all subsets of Ω. Thefollowing are examples of probability spaces that are not discrete. In eachcase, the underlying experiment is seen to result in a continuum of outcomes.Accordingly, the problem of determining the appropriate σ-eld of events andassigning suitable probabilities is somewhat more technical.

Example 2.3.6. Consider the experiment of randomly choosing a real num-ber from the interval [0, 1]. If we try to assign probabilities as in the discretecase, we should assume that the outcomes x are equally likely and thereforeset P(x) = p for some p ∈ [0, 1]. However, consider, for example, the eventJ that the number chosen is less than 1/2. Following the discrete case, theprobability of J should then be

∑x∈[0,1/2) p, which is either 0 or +∞, if it has

meaning at all.To avoid this problem, we take the following more natural approach: Since

we expect that half the time the number chosen will lie in the left half ofthe interval [0, 1], we dene P(J) = .5. More generally, for any subintervalI the probability that the selected number x lies in I should be the lengthof I, which is the theoretical proportion of times x lands in I. Generalizing,one may show that every Borel subset of [0, 1] may be assigned a probabilityconsistent with the axioms of a probability space. Therefore, it is natural totake the event σ-eld in this experiment to be the collection of all Borel sets(see Example 2.3.3).

As a concrete example, consider the event A that the selected number xhas a decimal expansion .d1d2d3 . . . with no digit dj equal to 3. Set A0 = [0, 1].Since d1 6= 3, A must be contained in the set A1 obtained by removing fromA0 the interval [.3, .4). Similarly, since d2 6= 3, A is contained in the setA2 obtained by removing from A1 the nine intervals of the form [.d13, .d14),d1 6= 3. Having obtained An−1 in this way, we see that A must be containedin the set An obtained by removing from An−1 the 9n−1 intervals of the


form [.d1d2 . . . dn−13, .d1d2 . . . dn−14), dj 6= 3. Since each of these intervalshas length 10−n, the additivity axiom implies that

P(An) = P(An−1)− 9n−110−n = P(An−1)− (.1)(.9)n−1.

Summing from 1 to N , we obtain

P(AN ) = 1− (.1)

N∑n=1

(.9)n−1 = (.9)N .

Therefore, P(A) ≤ (.9)N for all N , which implies that P(A) = 0. Thus, withprobability one, a number chosen randomly from the interval [0, 1] has a digit3 in its decimal expansion.

Example 2.3.7. Consider a dartboard in the shape of a square with theorigin of a coordinate system at the lower left corner and the point (1, 1) inthe upper right corner. We throw a dart and observe the coordinates (x, y)of the landing spot. (If the dart lands o the board, we ignore the outcome.)The sample space of this experiment is Ω = [0, 1] × [0, 1]. (This is obviouslya two-dimensional version of the preceding example.) Consider the region Abelow the curve y = x2. The area of A is 1/3, so we would expect that 1/3 ofthe time the dart will land in A (a fact borne out, for example, by computersimulation.) This suggests that we dene the probability of the event A tobe 1/3. More generally, the probability of any reasonable region is denedas the area of that region. It turns out that probabilities may be assignedto all two dimensional Borel subsets of Ω in a manner consistent with theaxioms.

Example 2.3.8. In the coin tossing experiment of Example 2.2.3, we simplynoted the rst time a head appears, giving us a sample space consisting of thenonnegative integers. If, however, we observe the entire sequence of outcomes,then the sample space consists of all sequences of H's and T 's and is no longerdiscrete. To see why this is the case, replace H and T by the digits 1 and0, respectively, so that an outcome may be identied with the binary (base2) expansion of a number in the interval [0, 1]. (For example, the outcomeTHTHTHTH . . . is identied with the number .01010101 . . . = 1/3.) Thesample space of the experiment may therefore be identied with the interval[0, 1], which is uncountable.

To assign probabilities in this experiment, we begin by giving a probabilityof 2−n to events that prescribe exactly n outcomes. For example, the event Athat H appears on the rst and third tosses would have probability 1/4. Notethat, under the above identication, the event A corresponds to the subset of[0, 1] consisting of all numbers with binary expansion beginning .101 or .111,namely, the union of the intervals [5/8, 3/4) and [7/8, 1). The total length ofthese intervals is 1/4, suggesting that the natural assignment of probabilities inthis example is precisely that of Example 2.3.6 (which is indeed the case).


2.4 Conditional Probability

Suppose we assign probabilities to the events A of an experiment and thenlearn that an event B has occurred. One would expect that this new informa-tion could change the original probabilities P(A). The altered probability ofA is called the conditional probability of A given B and is denoted by P(A|B).A precise mathematical denition of P(A|B) is suggested by the followingexample.

Example 2.4.1. Suppose that in a group of 100 people exactly 40 smoke,and that 15 of the smokers and 5 of the nonsmokers have lung cancer. Aperson is chosen at random from the group. Let A be the event that theperson has lung cancer and B the event that the person does not smoke.Suppose we discover that the person chosen is a nonsmoker, that is, that theevent B has occurred. Then, in computing the new probability of A, we shouldrestrict ourselves to the sample space B consisting of people who don't smoke.This gives P(A|B) = |AB|/|B| = 5/60 = .083, considerably smaller than theoriginal probability P(A) = |A|/100 = .2.

Note that in the preceding example

P(A|B) =|AB||B| =

|AB|/|Ω||B|/|Ω| =

P(AB)

P(B).

This suggests the following general denition of conditional probability.

Denition 2.4.2. Let (Ω,F ,P) be a probability space. If A and B are eventswith P(B) > 0, then the conditional probability of A given B is

P(A|B) =P(AB)

P(B).

P(A|B) is undened if P(B) = 0.

Example 2.4.3. In the dartboard experiment of Example 2.3.7, we assigned aprobability of 1/3 to the event A that the dart lands below the graph of y = x2.Let B be the event that the dart lands in the left half of the board and C theevent that the dart lands in the bottom half. Recalling that probability in thisexperiment is dened as area, we see that P(B) = P(C) = 1/2, P(AB) = 1/24,and P(BC) = 1/4. Therefore, P(A|B) = 1/12 and P(C|B) = 1/2. Knowledgeof the event B changes the probability of A but not of C.

Theorem 2.4.4 (Multiplication Rule for Conditional Probabilities). Supposethat A1, A2, . . . , An are events with P(A1A2 · · ·An−1) > 0. Then

P(A1A2 · · ·An) = P(A1)P(A2|A1)P(A3|A1A2) · · ·P(An|A1A2 · · ·An−1).(2.2)


Proof. (By induction on n.) The condition P(A1A2 · · ·An−1) > 0 ensures thatthe right side of (2.2) is dened. For n = 2, (2.2) follows from the denition ofconditional probability. Suppose (2.2) holds for n = k ≥ 2. If A = A1A2 · · ·Ak,then, by the case n = 2,

P(A1A2 · · ·Ak+1) = P(AAk+1) = P(A)P(Ak+1|A),

and, by the induction hypothesis,

P(A) = P(A1)P(A2|A1)P(A3|A1A2) · · ·P(Ak|A1A2 · · ·Ak−1).

Combining these results yields (2.2) for n = k + 1.

Example 2.4.5. An jar contains 5 red and 6 green marbles. We randomlydraw 3 marbles in succession without replacement. Let R1 denote the eventthat the rst marble is red, R2 the event that the second marble is red, andG3 the event that the third marble is green. The probability that the rst twomarbles are red and the third is green is

P(R1R2G3) = P(R1)P(R2|R1)P(G3|R1R2) = (5/11)(4/10)(6/9) ≈ .12.

Theorem 2.4.6 (Total Probability Law). Let B1, B2, . . . be a nite or innitesequence of mutually exclusive events whose union is Ω. If P(Bn) > 0 for everyn, then, for any event A,

P(A) =∑n

P(A|Bn)P(Bn).

Proof. The events AB1, AB2, . . . are mutually exclusive with union A, hence

P(A) =∑n

P(ABn) =∑n

P(A|Bn)P(Bn).

Example 2.4.7. (Investor's Ruin) Suppose you own a stock that each daygoes up $1 with probability p or down $1 with probability q = 1− p. Assumethat the stock is initially worth x and that you intend to sell the stock as soonas its value is either a or b, whichever comes rst, where 0 < a ≤ x ≤ b. Whatis the probability that you will sell low?Solution: Let f(x) denote the probability of selling low, that is, of the stockreaching a before b, given that the stock starts out at x. Let S+ (S−) be theevent that the stock goes up (down) the next day and A the event of yourselling low. Then P(A|S+) = f(x + 1), since, if the stock goes up, it's valuethe next day is x+ 1. Similarly, P(A|S−) = f(x− 1). By the total probabilitylaw,

P(A) = P(A|S+)P(S+) + P(A|S−)P(S−),

orf(x) = f(x+ 1)p+ f(x− 1)q.


Since p+ q = 1, the last equation may be written

∆f(x) := f(x+ 1)− f(x) = r∆f(x− 1), r := q/p.

Iterating, we obtain

∆f(x+ y) = r∆f(x+ y − 1) = r2∆f(x+ y − 2) = · · · = ry∆f(x),

hence

f(x)− f(a) =

x−a−1∑y=0

∆f(a+ y) = ∆f(a)

x−a−1∑y=0

ry.

Since f(a) = 1, we see that

f(x) = 1 + ∆f(a)

x−a−1∑y=0

ry. (2.3)

If p = q, then (2.3) reduces to f(x) = 1 + (x− a)∆f(a). Setting x = b andnoting that f(b) = 0, we obtain ∆f(a) = −1/(b− a), and hence

f(x) = 1− x− ab− a =

b− xb− a .

If p 6= q, then (2.3) becomes

f(x) = 1 + ∆f(a)rx−a − 1

r − 1. (2.4)

Setting x = b, we obtain ∆f(a) = −(r− 1)/(rb−a − 1

), and substituting this

into (2.4) gives

f(x) = 1− r − 1

rb−a − 1

rx−a − 1

r − 1=rb−a − rx−arb−a − 1

.

Example 2.4.7 is a stock market version of what is usually called gambler'sruin. The name comes from the standard formulation of the example, wherethe stock's value is replaced by the winnings of a gambler. Selling low is theninterpreted as the ruin of the gambler.

The stock movement in this example is known as random walk. We returnto this notion later.

2.5 Independence

Denition 2.5.1. Events A and B in a probability space are said to be inde-pendent if P(AB) = P(A)P(B).


Note that if P(B) 6= 0, then independence is equivalent to the statementP(A|B) = P(A), which asserts that the additional information provided by Bis irrelevant to A. A similar remark holds if P(A) 6= 0.

The events B and C of Example 2.4.3 are independent, while A and B arenot. Here are some other examples.

Example 2.5.2. Suppose in Example 2.4.5 that we draw two marbles insuccession without replacement. Then, by the total probability law, the prob-ability of getting a red marble on the second try is

P(R2) = P(R1)P(R2|R1) + P(G1)P(R2|G1)

= (5/11)(4/10) + (6/11)(5/10)

= 5/11

6= P(R2|R1),

hence R1 and R2 are not independent. This agrees with our intuition, sincedrawing without replacement obviously changes the conguration of marblesin the jar. If, on the other hand, we replace the rst marble, then P(R2|R1) =P(R2); the events are independent. Note that in general experiments of thistype, P(R2) = P(R1), whether or not the marbles are replaced.

Example 2.5.3. Roll a fair die twice (or, equivalently, toss a pair of distin-guishable fair dice once) and observe the number of dots on the upper faceon each roll. A typical outcome can be described by the ordered pair (j, k),where j and k are, respectively, the number of dots on the upper face in therst and second rolls. Since the die is fair, each of the 36 outcomes has thesame probability. Let A be the event that the sum of the dice is 7, B the eventthat the sum of the dice is 8, and C the event that the rst die is even. ThenP(AC) = 1/12 = P(A)P(C) but P(BC) = 1/12 6= P(B)P(C); the events Aand C are independent, but B and C are not.

The denition of independence may be extended in a natural way to morethan two events.

Denition 2.5.4. Events in a collection A are independent if for any n andany choice of A1, A2, . . ., An ∈ A,

P(A1A2 · · ·An) = P(A1)P(A2) · · ·P(An).

Example 2.5.5. Toss a fair coin 3 times in succession and let Aj be theevent that the jth coin comes up heads, j = 1, 2, 3. The events A1, A2, andA3 are easily seen to be independent, which explains the use of the phraseindependent trials in this and similar examples.


2.6 Exercises

1. Show that P(A) + P(B)− 1 ≤ P(AB) ≤ P(A ∪B) ≤ P(A) + P(B).

2. Use the inclusion-exclusion rule for two events to prove the correspond-ing rule for three events:

P(A∪B∪C) = P(A)+P(B)+P(C)−P(AB)−P(AC)−P(BC)+P(ABC).

Formulate and prove an inclusion-exclusion rule for four events.

3. Jack and Jill run up the hill. The probability of Jack reaching the toprst is p, while that of Jill is q. They decide to have a tournament, thegrand winner being the rst one who wins 3 races. Find the probabilitythat Jill wins the tournament. Assume that there are no ties (p+ q = 1)and that the races are independent.

4. A full house is a poker hand with 3 cards of one denomination and 2cards of another, for example, three kings and two jacks. Show that fourof a kind beats a full house.

5. Balls are randomly thrown one at a time at a row of 30 open-topped jarsnumbered 1 to 30. Assuming that each ball lands in some jar, nd thesmallest number of throws so that there is a better than a 60% chancethat at least two balls land in the same jar.

6. Toss a coin innitely often and let p be the probability of a head ap-pearing on any single toss, 0 < p < 1. For m ≥ 2, nd the probabilityPm that

(a) a head appears on a toss that is a multiple of m;

(b) the rst head appears on a toss that is a multiple of m.

(For example, in (a) P2 is the probability that a head appears on aneven toss.) Show in (b) that

limm→∞

Pm = limp→1−

Pm = 0, and limp→0+

Pm = 1/m.

7. Toss a coin twice and let p be the probability of a head appearing on asingle toss, 0 < p < 1. Find the probability that (a) both tosses come upheads, given that at least one toss comes up heads; (b) both tosses comeup heads, given that the rst toss comes up heads. Can the probabilitiesin (a) and (b) ever be the same?

8. A jar contains n−1 vanilla cookies and one chocolate cookie. You reachinto the jar and choose a cookie at random. What is the probabilitythat you will get the chocolate cookie on the kth try if you (a) eat, (b)


replace, each cookie you select. Show that if n is large compared to kthen the ratio of the probabilities in (a) and (b) is approximately 1.

9. A hat contains six slips of paper numbered 1 through 6. A slip is drawnat random, the number is noted, the slip is replaced in the hat, and theprocedure is repeated. What is the probability that after three drawsthe slip numbered 1 was drawn exactly twice, given that the sum of thenumbers on the three draws is 8.

10. A jar contains 12 marbles: 3 reds, 4 greens, and 5 yellows. A handfulof 6 marbles is drawn at random. Let A be the event that there are atleast 3 green marbles and B the event that there is exactly 1 red. FindP(A|B). Are the events independent?

11. A number x is chosen randomly from the interval [0, 1]. Let A be theevent that x < .5 and B the event that the second and third digits of thedecimal expansion .d1d2d3 . . . of x are 0. Are the events independent?What if the inequality is changed to x < .49?

12. Roll a fair die twice. Let A be the event that the rst roll comes up odd,B the event that the second roll is odd, and C the event that the sumof the dice is odd. Show that any two of the events A, B, and C areindependent but the events A, B, and C are not independent.

13. Suppose that A and B are independent events. Show that in each casethe given events are independent: (a) A and B′; (b) A′ and B; (c) A′

and B′.

14. John and Mary order pizzas. The pizza shop oers only plain, anchovy,and sausage pizzas with no multiple toppings. The probability that Johngets a plain (resp., anchovy) is .1 (resp., .2) and the probability thatMary gets a plain (resp., anchovy) is .3 (resp., .4). Assuming that Johnand Mary order independently, use Exercise 13 to nd the probabilitythat neither gets a plain but at least one gets an anchovy.

15. The odds for an event E are said to be r to 1 if E is r times as likely tooccur as E′, that is, P(E) = rP(E′). Odds r to s means the same thingas odds r/s to 1, and odds r to s against means the same as odds s tor for. A bet of one dollar on an event E with odds r to s is fair if thebettor wins s/r dollars if E occurs and loses one dollar if E′ occurs. (IfE occurs, the dollar wager is returned to the bettor.) Show that, if the

odds for E are r to s, then (a) P(E) =r

r + sand (b) a fair bet of one

dollar on E returns 1/P(E) dollars (including the wager) if E occurs.

16. Consider a race with only three horses, H1, H2, and H3. Suppose thatthe odds against Hi winning are quoted as oi to 1. If the odds are basedsolely on probabilities (determined by, say, statistics on previous races),


then, by Exercise 15, the probability that horse Hi wins is (1 + oi)−1.

Assuming there are no ties,

o :=((1 + o1)−1, (1 + o2)−1, (1 + o3)−1

)is a then probability vector. It is typically the case, however, that quotedodds are based on additional factors such as the distribution of wagersmade before the race and prot margins for the bookmaker. In thisexercise, the reader is asked to use elementary linear algebra to make aconnection between quoted odds and betting strategies.

(a) Suppose that for each i a bet of size bi is made on Hi. The betsmay be positive, negative, or 0. The vector b = (b1, b2, b3) is called abetting strategy. Show that if horse Hi wins, then the net winnings forthe betting strategy b may be expressed as

Wb(i) := (oi + 1)bi − (b1 + b2 + b3).

(b) A betting strategy b is said to be a sure-win strategy, or an arbitrage,if Wb(i) > 0 for each i. Show that there is a sure-win strategy i thereexist numbers si < 0 such that the system

−o1x1 +x2 +x3 = s1

x1 −o2x2 +x3 = s2

x1 +x2 −o3x3 = s3

has a solution x = (x1, x2, x3) (that solution being a sure-win bettingstrategy).

(c) Let A be the coecient matrix of the system in (b). Show that thedeterminant of A is D := 2 + o1 + o2 + o3 − o1o2o3.

(d) Suppose D 6= 0. Show that

A−1 =1

D

o2o3 − 1 1 + o3 1 + o2

1 + o3 o1o3 − 1 1 + o1

1 + o2 1 + o1 o1o2 − 1

and that, for any choice of negative numbers s1, s2, and s3, the vectorA−1sT is a sure-win betting strategy, where sT denotes the transpose ofs := (s1, s2, s3).

(e) Show that if D 6= 0, then a sure-win betting strategy is

b = −sgn(D)(1 + o2 + o3 + o2o3, 1 + o1 + o3 + o1o3, 1 + o1 + o2 + o1o2)

where sgn(D) denotes the sign of D.

(f) Show that there is a sure-win betting strategy i o is not a probabilityvector.

(The assertion in (f) is a special case of the Arbitrage Theorem, a state-ment and proof of which may be found, for example, in [14].)

Chapter 3

Random Variables

3.1 Denition and General Properties

We saw in Chapter 2 that outcomes of some experiments may be describedby real numbers. Such outcomes are called random variables. For a formal de-nition, the following notation will be convenient. Given a real-valued functionX on Ω and a set A of real numbers, we shall write X ∈ A for the setω ∈ Ω | X(ω) ∈ A. Similarly, X < a denotes the set ω ∈ Ω | X(ω) < a,and if Y is another function on Ω, then X ∈ A, Y ∈ B stands for the setX ∈ A ∩ Y ∈ B, X ≤ Y for the set ω ∈ Ω | X(ω) ≤ Y (ω), and soforth. For probabilities of such events, we shall write, for example, P(X ∈ A)rather than the more cumbersome notation P(X ∈ A). The following exam-ple should illustrate the idea.

Example 3.1.1. The table below gives the distribution of grade-point aver-ages for a group of 100 students, the rst row giving the number of studentshaving the grade-point averages listed in the second row. If X denotes the

no. of students 7 13 19 16 12 10 8 6 5 4grade pt. avg. 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 3.7 3.9

grade-point average of a randomly chosen student, then, in the above nota-tion,

P(2.5 < X ≤ 3.3) = P(X = 2.7) + P(X = 2.9) + P(X = 3.1) + P(X = 3.3)

= .16 + .12 + .10 + .08 = .46.

Denition 3.1.2. Consider an experiment with sample space Ω and eventσ-eld F . A random variable on (Ω,F) is a real-valued function X on Ω suchthat for each interval J the set X ∈ J is a member of F . If there is apossibility of ambiguity, we shall refer to X as an F-random variable or saythat X is F-measurable.

Remark 3.1.3. To determine whether a function X is a random variable, itsuces to check that X ∈ J ∈ F for all intervals J of a particular type. For

27


example, suppose that a function X on Ω satises X ≤ a ∈ F for all realnumbers a. Since

X ∈ (a, b) = X < b − X ≤ a =

∞⋃n=1

X ≤ b− 1/n − X ≤ a,

our assumption implies that X ∈ (a, b) ∈ F . Similar arguments show thatX ∈ J ∈ F for the remaining intervals J . Therefore,X is a random variable.

A simple example of a random variable is the indicator function of anevent, which provides a numerical way of expressing occurrence of the event.

Denition 3.1.4. The indicator function of a subset A ⊆ Ω is the functionIA on Ω dened by

IA(ω) =

1 if ω ∈ A, and0 if ω ∈ A′.

Proposition 3.1.5. IA is a random variable i A ∈ F .Proof. If IA is a random variable, then A = IA > 0 ∈ F . Conversely, ifA ∈ F and a ∈ R, then

IA ≤ a =

∅ if a < 0

A′ if 0 ≤ a < 1, and

Ω if a ≥ 1.

In each case, IA ≤ a ∈ F , hence, by Remark 3.1.3, IA is a random variable.

The proof of the following proposition is left to the reader as an exercise.

Proposition 3.1.6. If A, B, and C are subsets of Ω, then(i) IAB = IAIB; (ii) IA∪B = IA + IB − IAIB; and (iii) IA ≤ IB i A ⊆ B.

The next theorem describes a simple way of generating random variables.

Theorem 3.1.7. Let X1, X2,. . ., Xn be random variables. If f(x1, x2, . . . , xn)is a continuous function, then f(X1, X2, . . . , Xn) is a random variable, where

f(X1, X2, . . . , Xn)(ω) := f(X1(ω), X2(ω), . . . , Xn(ω)).

Proof. We sketch the proof for the case n = 1, that is, for a single randomvariable X and a continuous function f(x). For this, we use a standard resultfrom real analysis, which asserts that, because f is continuous, any set A ofthe form x | f(x) < a is a union of a sequence of pairwise disjoint openintervals Jn. It follows that

f(X) < a = X ∈ A =⋃n

X ∈ Jn ∈ F .

By Remark 3.1.3, f(X) is a random variable.

Random Variables 29

Corollary 3.1.8. Let X and Y be random variables, α ∈ R and p > 0.Then X + Y , αX, XY , |X|p, and X/Y (provided Y is never 0) are randomvariables.

Combining Corollary 3.1.8 with Proposition 3.1.5, we obtain

Corollary 3.1.9. A linear combination∑nj=1 αjIAj of indicator functions

IAj with Aj ∈ F is a random variable.

Denition 3.1.10. The functions max(x, y) and min(x, y) denote, respec-tively, the larger and smaller of the real numbers x and y.

Corollary 3.1.11. max(X,Y ) and min(X,Y ) are random variables.

Proof. The identity

max(x, y) =|x− y|+ x− y

2+ y

shows that max(x, y) is continuous. Therefore, by Theorem 3.1.7, max(X,Y )is a random variable. A similar argument shows that min(X,Y ) is a randomvariable. (Or use the identity min(x, y) = −max(−x,−y).)

Denition 3.1.12. The cumulative distribution function (cdf) FXof a ran-

dom variable X is dened by

FX

(x) = P(X ≤ x), x ∈ R.

Example 3.1.13. The cdf of the number X of heads that come up in threetosses of a fair coin may be described as

FX

= 18 I[0,1) + 1

2 I[1,2) + 78 I[2,3) + I[3,∞).

3.2 Discrete Random Variables

Denition 3.2.1. A random variable X on a probability space (Ω,F ,P) issaid to be discrete if the range of X is countable. The function p

Xdened by

pX

(x) := P(X = x), x ∈ R,

is called the probability mass function (pmf) of X.

Since pX

(x) > 0 for at most countably many real numbers x, for A ⊆ Rwe may write

P(X ∈ A) =∑x∈A

pX

(x),


where the sum is either nite or a convergent innite series (ignoring zeroterms). In particular,

FX

(x) =∑y≤x

pX

(x)

and ∑x∈R

pX

(x) = P(X ∈ R) = 1.

A linear combination of indicator functions of events is an example of adiscrete random variable. Here are some important special cases.

Example 3.2.2. (Bernoulli Random Variable). A Bernoulli trial is an exper-iment with only two outcomes, frequently called success and failure. If p isthe probability of a success, then a Bernoulli random variable with parameterp is dened by setting X = 1 if the outcome is a success and X = 0, if theoutcome is a failure. Thus,

pX

(1) = p and pX

(0) = 1− p.

The number of heads that come up on a single toss of a coin is an example ofa Bernoulli random variable.

Example 3.2.3. (Binomial Random Variable). Consider an experiment con-sisting of N independent Bernoulli trials, each with parameter p. LetX denotethe number of successes in N trials. Since the event X = n can occur in(Nn

)ways, and since each of these has probability pn(1− p)N−n,

pX

(n) =

(N

n

)pn(1− p)N−n, n = 0, 1, 2, . . . , N.

A random variable X with this pmf is called a binomial random variable withparameters (N, p), in symbols X ∼ B(N, p). The number of heads occurringin N consecutive coin tosses is an example of a binomial random variable.

Example 3.2.4. (Geometric Random Variable). Suppose that an experimentconsists of independent Bernoulli trials with parameter p. Let X denote thenumber of trials until the rst success. For the event X = k to occur, therst k − 1 trials must be failures and the kth trial a success. Thus,

pX

(k) = qk−1p, k = 1, 2, . . . , q := 1− p. (3.1)

Note that∞∑k=0

pX

(k) = p

∞∑k=0

qk = 1,

so pXis indeed a pmf. A random variable with this distribution is said to be

geometric with parameter p.

Random Variables 31

Example 3.2.5. (Hypergeometric Random Variable). Consider a jar with mred marbles and n white marbles. We take a random sample of size z ≤ N :=m+n from the jar by drawing the marbles in succession without replacement.(Equivalently, we can just draw the z marbles at once.) The sample space Ω isthe collection of all subsets of z marbles, hence |Ω| =

(Nz

). Let X denote the

number of red marbles in the sample. The event X = x can be realized byrst choosing x red marbles from the m red marbles and then choosing z − xwhite marbles from the n white marbles. Thus,

P(X = x) =

(m

x

)(n

z − x

)(N

z

)−1

, (3.2)

where x must satisfy 0 ≤ x ≤ m and 0 ≤ z − x ≤ n. Letting p = m/N denotethe fraction of red marbles in the jar and q = 1 − p = n/N the fraction ofwhite, we can write (3.2) as

pX

(x) =

(Np

x

)(Nq

z − x

)(N

z

)−1

, max(z − n, 0) ≤ x ≤ min(z,m). (3.3)

A random variable X with this pmf is called a hypergeometric random vari-able with parameters (p, z,N). That p

Xis indeed a pmf may be established

analytically or may be argued on probabilistic grounds.The experiment described in this example is called sampling without re-

placement. (Sampling with replacement, that is, replacing each marble beforedrawing the next, results in the binomial pmf.) For a political setting, let themarbles represent individuals in a population of size N from which a sampleof size z is randomly selected and then polled, red marbles representing indi-viduals in the sample who favor candidate A for political oce, white marblesrepresenting those who favor candidate B. Pollsters use the sample to estimatethe proportion of people in the general population who favor candidate A andalso determine the margin of error in doing so. The hypergeometric pmf maybe used for this purpose.1 Marketing specialists apply similar techniques todetermine product preferences.

1In practice, the more tractable normal distribution is used instead (Example 3.3.3). Thisis justied by noting that for large N the hypergeometric distribution is nearly binomialand, by the Central Limit Theorem, that a binomial distribution (suitably adjusted) isapproximately normal.


3.3 Continuous Random Variables

Denition 3.3.1. A random variable X is said to be continuous if there existsa nonnegative integrable function f

Xsuch that

P(X ∈ J) =

∫J

fX

(x) dx

for all intervals J . The function fXis called the probability density function

(pdf) of X.

For example, the probability

P(a ≤ X ≤ b) =

∫ b

a

fX

(x) dx

is simply the area under the graph of fXbetween a and b. Setting a = b, we

see that the probability that X takes on any particular value is zero.The cumulative distribution function of X takes the form

FX

(x) = P(X ≤ x) =

∫ x

−∞fX

(t) dt.

Dierentiating with respect to x, we have (at points of continuity of fX)

F ′X

(x) = fX

(x).

Example 3.3.2. (Uniform Random Variable). A random variable X is saidto be uniformly distributed on the interval (α, β) if its pdf is of the form

fX

(x) = (β − α)−1I(α,β),

where I(α,β) is the indicator function of the interval (α, β). For any subinterval(a, b) of (α, β),

P(a < X < b) =

∫ b

a

fX

(x) dx = (β − α)−1(b− a),

which is the (theoretical) fraction of times a number chosen randomly fromthe interval (α, β) lies in the subinterval (a, b).

Example 3.3.3. (Normal Random Variable). Let σ and µ be real numberswith σ > 0. A random variable X is said to have a normal distribution withparameters µ and σ2, in symbols X ∼ N(µ, σ2), if it has pdf

fX

(x) =1

σ√

2πe−(x−µ)2/2σ2

. (3.4)

Random Variables 33

If X ∼ N(0, 1), then X is said to have the standard normal distribution. Inthis case, we write ϕ for f

Xand Φ for F

X. Thus,

ϕ(x) :=1√2πe−x

2/2 and Φ(x) :=1√2π

∫ x

−∞e−t

2/2 dt.

Note that, if X ∼ N(µ, σ2), then

FX(x) = Φ

(x− µσ

),

as may be seen by making a simple substitution in the integral dening FX.

The normal density (3.4) is the familiar bell-shaped curve, with maximumoccurring at x = µ. The parameter σ controls the spread of the bell: the largerthe value of σ the atter the bell. Normal random variables arise in samplingfrom a large population of independent measurements such as test scores,heights, and so forth. They will gure prominently in later chapters.

Example 3.3.4. (Exponential Random Variable). A random variable X issaid to have an exponential distribution with parameter λ if it has pdf

fX

(x) =

λe−λx, if x ≥ 0

0, if x < 0.(3.5)

The cdf of X is

FX

(x) =

∫ x

−∞fX

(y) dy =

1− e−λx, if x ≥ 0

0, if x < 0.

In particular, if x ≥ 0, then P(X > x) = 1 − FX

(x) = e−λx. It follows that,for s, t ≥ 0,

P(X > s+ t,X > t) = P(X > s+ t) = e−λ(s+t) = P(X > t)P(X > s),

which may be expressed in terms of conditional probabilities as

P(X > s+ t|X > t) = P(X > s). (3.6)

Equation (3.6) is the so-called memoryless feature of an exponential randomvariable. If we take X to be the lifetime of some instrument, say, a light bulb,then (3.6) asserts (perhaps unrealistically) that the probability of the lightbulb lasting at least s+ t hours, given that it has already lasted t hours, is thesame as the initial probability that it will last at least s hours. Waiting times(for example, of buses and bank clerks) are frequently assumed to be expo-nential random variables. It may be shown that the exponential distributionis the only continuous distribution for which (3.6) holds.


3.4 Joint Distributions

Denition 3.4.1. The joint probability mass function of discrete randomvariables X and Y is dened by

pX,Y

(x, y) = P(X = x, Y = y).

Note that the pmf pXmay be recovered from p

X,Yby using the identity

pX

(x) = P(X = x, Y ∈ R) =∑y

pX,Y

(x, y),

where the sum is taken over all y in the range of Y . A similar identity holds forpY. In this context, p

Xand p

Yare called marginal probability mass functions.

Denition 3.4.2. The joint cumulative distribution function FX,Y

of a pairof random variables X and Y is dened by

FX,Y

(x, y) = P(X ≤ x, Y ≤ y).

X and Y are said to be jointly continuous if there exists a nonnegative inte-grable function f

X,Y, called the joint probability density function of X and Y ,

such that

FX,Y

(x, y) =

∫ x

−∞

∫ y

−∞fX,Y

(s, t) dt ds. (3.7)

It follows from (3.7) that

P(X ∈ J, Y ∈ K) =

∫K

∫J

fX,Y

(x, y) dx dy =

∫∫J×K

fX,Y

(x, y) dx dy

for all intervals J and K. More generally,

P ((X,Y ) ∈ A) =

∫∫A

f(x, y) dx dy

for all suciently regular (e.g., Borel) sets A ⊆ R2.Dierentiating (3.7) gives the following useful connection between f

X,Y

and FX,Y

:

fX,Y

(x, y) =∂2

∂x∂yFX,Y

(x, y).

Example 3.4.3. Let X and Y be jointly continuous random variables and

Random Variables 35

set Z = X + Y . If Az = (x, y) | x+ y ≤ z, then

FZ

(z) = P ((X,Y ) ∈ Az)

=

∫∫x+y≤z

fX,Y

(x, y) dx dy

=

∫ ∞−∞

∫ z−y

−∞fX,Y

(x, y) dx dy

=

∫ ∞−∞

∫ z

−∞fX,Y

(x− y, y) dx dy.

Changing the order of integration in the last expression, we see that

fZ

(x) =

∫ ∞−∞

fX,Y

(x− y, y) dy.

The following proposition shows that, by analogy with the discrete case,the pdfs of jointly continuous random variables may be recovered from thejoint pdf. In this context, f

Xand f

Yare called marginal density functions.

Proposition 3.4.4. If X and Y are jointly continuous random variables,then

fX

(x) =

∫ ∞−∞

fX,Y

(x, y) dy and fY

(y) =

∫ ∞−∞

fX,Y

(x, y) dx.

Proof. For any interval J ,

P(X ∈ J) = P(X ∈ J, Y ∈ R) =

∫J

∫ ∞−∞

f(x, y) dy dx.

The inner integral must therefore be the density fXof X.

Remark 3.4.5. The above denitions and results extend to the case of nitelymany random variables X1, X2, . . ., Xn. We leave the formulations to thereader.

3.5 Independent Random Variables

Denition 3.5.1. Random variables X1, X2, . . . are said to be independent ifthe events Xn ∈ Jn, n = 1, 2, . . ., are independent for any choice of intervalsJn.

Proposition 3.5.2. Discrete random variables X and Y are independent i

pX,Y

(x, y) = pX

(x)pY

(y) for all x and y.


Proof. The proof of the necessity is left to the reader. For the suciency,suppose that the equation holds. Then for any intervals J and K

P(X ∈ J, Y ∈ K) =∑

x∈J, y∈KpX,Y

(x, y)

=∑x∈J

pX

(x)∑y∈K

pY

(y)

= P(X ∈ J)P(Y ∈ K),

where the sums are taken over all x ∈ J in the range of X and all y ∈ K inthe range of Y . Therefore, X and Y are independent.

Proposition 3.5.3. Let X and Y be random variables. Then X and Y areindependent i

FX,Y

(x, y) = FX

(x)FY

(y) for all x and y.

Proof. We give a partial proof of the suciency. If the equation holds, then

P(a < X ≤ b, c < Y ≤ d) = FX,Y

(b, d)− FX,Y

(a, d)− FX,Y

(b, c) + FX,Y

(a, c)

= [FX

(b)− FX

(a)] [FY

(d)− FY

(c)]

= P(a < X ≤ b)P(c < Y ≤ d).

It follows that

P(a ≤ X ≤ b, c ≤ Y ≤ d) = limn→∞

P(a− 1/n < X ≤ b, c− 1/n < Y ≤ d)

= limn→∞

P(a− 1/n < X ≤ b)P(c− 1/n < Y ≤ d)

= P(a ≤ X ≤ b)P(c ≤ Y ≤ d).

Here we have used the fact that, if A1 ⊇ A2 · · · ⊇ An · · · , then

P (∩∞n=1An) = limn→∞

P(An).

Other intervals are treated in a similar fashion. Therefore, X and Y are inde-pendent.

Corollary 3.5.4. Let X and Y be jointly continuous random variables. ThenX and Y are independent i f

X,Y(x, y) = f

X(x)f

Y(y).

Proof. If X and Y are independent, then, by Proposition 3.5.3,

FX,Y

(x, y) =

(∫ x

−∞fX

(s) ds

)(∫ y

−∞fY

(t) dt

)=

∫ x

−∞

∫ y

−∞fX

(s)fY

(t) dt ds,

which shows that fX

(x)fY

(y) is the joint density function. Conversely, if thedensity condition holds, then reversing the argument shows that F

X,Y(x, y) =

FX

(x)FY

(y).

Random Variables 37

Proposition 3.5.5. If X and Y are independent random variables and g andh are continuous functions, then g(X) and h(Y ) are independent.

Proof. Let F denote the joint cdf of g(X) and h(Y ). As in the proof of The-orem 3.1.7, given real numbers a and b there exist sequences (Jm) and (Kn)of pairwise disjoint open intervals such that

g(X) < a =⋃m

X ∈ Jm and h(Y ) < b =⋃n

Y ∈ Kn.

It follows that

P(g(X) < a, h(Y ) < b) =∑m

∑n

P(X ∈ Jm, Y ∈ Kn)

=∑m

∑n

P(X ∈ Jm)P(Y ∈ Kn)

= P(g(X) < a)P(h(Y ) < b).

In particular, for k = 1, 2, . . .,

P(g(X) < x+ 1/k, h(Y ) < y + 1/k) = P(g(X) < x+ 1/k)P(h(Y ) < y + 1/k),

and letting k → ∞ shows that F (x, y) = Fg(X)

(x)Fh(Y )

(y). Therefore, byProposition 3.5.3, g(X) and h(Y ) are independent.

Remark 3.5.6. The above results have obvious extensions to the case of morethan two random variables. We leave the formulations to the reader.

Denition 3.5.7. Random variables with the same cdf are said to be identi-cally distributed. If the random variables are also independent, then the col-lection is said to be iid.

Example 3.5.8. For a sequence of independent Bernoulli trials with param-eter p, let X1 be the number of trials before the rst success, and for k > 1let Xk be the number of trials between the (k − 1)st and kth successes. Theevent X1 = m,X2 = n occurs with probability qmpqnp, q := 1− p, hence

P(X2 = n) =

∞∑m=0

P(X1 = m,X2 = n) = p2qn∞∑m=0

qm =p2qn

1− q = pqn

= P(X1 = n).

Therefore, X1 and X2 are identically distributed. Since

P(X1 = m)P(X2 = n) = pqmpqn = P(X1 = m,X2 = n),

Proposition 3.5.2 shows that X1 and X2 are independent. An induction ar-gument using similar calculations shows that the sequence (Xn) is iid. Notethat Xn + 1 is a geometric random variable with parameter p.


3.6 Sums of Independent Random Variables

If X and Y are independent discrete random variables then

pX+Y

(z) = P(X + Y = z) =∑x

P(X = x, Y = z − x)

=∑x

pX

(x)pY

(z − x). (3.8)

The sum in (3.8) is called the convolution of the pmf's pXand p

Y.

Example 3.6.1. Let X and Y be independent binomial random variableswith X ∼ B(m, p) and Y ∼ B(n, p). We show that Z := X+Y ∼ B(m+n, p).

By (3.8),

pZ

(z) =∑x

(m

x

)pxqm−x

(n

z − x

)pz−xqn−(z−x), q := 1− p,

where the sum is taken over all integers x satisfying the inequalities 0 ≤ x ≤ mand 0 ≤ z − x ≤ n, that is, max(z − n, 0) ≤ x ≤ min(z,m). Simplifying, wesee from Example 3.2.5 that

pZ

(z) = pzqm+n−z∑x

(m

x

)(n

z − x

)=

(m+ n

z

)pzqm+n−z.

Therefore, Z ∼ B(m+ n, p).

Now suppose that X and Y are jointly continuous independent randomvariables. By Corollary 3.5.4, the joint density of X and Y is f

X(x)f

Y(y). By

Example 3.4.3, then,

fX+Y

(z) =

∫ ∞−∞

fX

(x)fY

(z − x) dx =

∫ ∞−∞

fY

(y)fX

(z − y) dy. (3.9)

The integrals in (3.9) are called the convolution of the densities fXand f

Y.

Example 3.6.2. Let X and Y be independent normal random variables withX ∼ N(µ, σ2) and Y ∼ N(ν, τ2). We claim that

X + Y ∼ N(µ+ ν, σ2 + τ2).

To verify this set Z = X + Y and suppose rst that µ = ν = 0. We need toshow that f

Z= g, where

g(z) =1

%√

2πexp

(−z2/2%2

), %2 := σ2 + τ2.

Random Variables 39

From (3.9),

fZ

(z) = a

∫ ∞−∞

exp

−1

2

[(z − y)2

σ2+y2

τ2

]dy

= a

∫ ∞−∞

exp

τ2(z − y)2 + σ2y2

−2σ2τ2

dy,

where a = (2πστ)−1. The expression τ2(z − y)2 + σ2y2 in the second integralmay be written

τ2z2 − 2τ2yz + %2y2 = %2

(y2 − 2τ2yz

%2

)+ τ2z2

= %2

(y − τ2z

%2

)2

+ τ2z2 − τ4z2

%2

= %2

(y − τ2z

%2

)2

+τ2σ2z2

%2.

Thus, for suitable positive constants b and c,

τ2(z − y)2 + σ2y2

−2σ2τ2= −b(y − cz)2 − z2

2%2.

It follows that

fZ

(z) = ae− z2

2%2

∫ ∞−∞

e−b(y−cz)2

dy = ae− z2

2%2

∫ ∞−∞

e−bu2

du = kg(z)

for some constant k and for all z. Since fZand g are both densities, k must

equal 1, verifying the assertion for the case µ = ν = 0.For the general case, observe that X −µ ∼ N(0, σ2) and Y − ν ∼ N(0, τ2)

so by the rst part Z−µ−ν ∼ N(0, %2). Therefore, Z ∼ N(µ+ν, σ2 +τ2).

Example 3.6.3. Let Sn denote the price of a stock on day n. A commonmodel assumes that the ratios Zn := Sn/Sn−1, n ≥ 1, are iid lognormalrandom variables with parameters µ and σ2, that is, lnZn ∼ N(µ, σ2). Notethat Zn − 1 is the fractional increase of the stock from day n − 1 to day n.The probability that the price of the stock rises on each of the rst n days is

P(Z1 > 1, Z2 > 1, . . . , Zn > 1) = Pn(lnZ1 > 0) =

[1− Φ

(−µσ

)]n= Φn

(µσ

),

where we have used the identity 1− Φ(−x) = Φ(x) (Exercise 7).The probability that on day n the price of the stock will be larger than its


initial price S0 is

P(Sn > S0) = P(Z1Z2 · · ·Zn > 1) = P(lnZ1 + lnZ2 + · · ·+ lnZn > 0)

= 1− P(lnZ1 + lnZ2 + · · ·+ lnZn < 0)

= 1− Φ

(−nµσ√n

)= Φ

(µ√n

σ

),

the second to last equality since lnZ1 + lnZ2 + · · · + lnZn ∼ N(nµ, nσ2)(Example 3.6.2).

Random Variables 41

3.7 Exercises

1. Determine the least number of times you need to toss a fair coin to be99% sure that at least two heads will come up.

2. Prove Proposition 3.1.6.

3. Let X be a hypergeometric random variable with parameters (p, n,N).Show that

limN→∞

pX

(k) =

(n

k

)pkqn−k.

4. Let X1, X2, . . . be an innite sequence of random variables such thatthe limit

X(ω) := limn→∞

Xn(ω)

exists for each ω ∈ Ω. Verify that

X < a =

∞⋃m=1

∞⋃n=1

⋂k≥n

Xk < a− 1/m,

and hence conclude that X is a random variable.

5. Let Y = aX + b, where X is a continuous random variable and a and bare constants with a 6= 0. Show that

fY

(y) = |a|−1fX

(y − ba

).

6. Let X be a random variable with density fX

and set Y = X2. Showthat

fY

(y) =fX

(√y) + f

X(−√y)

2√y

I(0,+∞).

In particular, nd fYif X is uniformly distributed over (−1, 1).

7. Show that 1 − Φ(x) = Φ(−x). Conclude that X ∼ N(0, 1) i −X ∼N(0, 1).

8. Show that, if X ∼ N(µ, σ2) and a 6= 0, then aX + b ∼ N(aµ+ b, a2σ2).

9. In the dartboard of Example 2.3.7, let Z be the distance from the originto the landing position of the dart. Find F

Zand f

Z.

10. Let Z = X + Y , where X and Y are independent and uniformly dis-tributed on (0, 1). Show that

FZ

(z) = 12z

2I[0,1)(z) +[1− 1

2 (2− z)2]I[1,2)(z) + I[2,∞)(z).


11. For this exercise, refer to Example 3.6.3. Let 1 ≤ k ≤ n. Find theprobability that the stock

(a) increases exactly k times in n days;

(b) increases exactly k consecutive times in n days (decreasing on theother days); and

(c) has a run of exactly k consecutive increases (not necessarily decreas-ing on the other days), where k ≥ n/2.

12. Let Xj , j = 1, 2, . . . , n, be independent Bernuoulli random variableswith parameter p and let 1 ≤ m < n and 1 ≤ k ≤ n. Show that

P(Ym = j, Yn = k) =

(m

j

)(n−mk − j

)pkqn−k,

where max(0, k − n+m) ≤ j ≤ min(m, k). Conclude that, for xed k,

pX

(j) := P(Ym = j|Yn = k)

is the pmf of a hypergeometric random variable X with parameters(m/n, k, n).

13. Let X and Y be independent random variables. Express the cdfs FM

ofmax(X,Y ) and Fm of min(X,Y ) in terms of F

Xand F

Y. Conclude that

FM

+ Fm = FX

+ FY.

Chapter 4

Options and Arbitrage

Assets traded in nancial markets fall into the following main categories: stocks(equity in a corporation or business); bonds (nancial contracts issued by gov-ernments and corporations); currencies (traded on foreign exchanges); com-modities (goods, such as oil, copper, wheat, or electricity); and derivatives.

A derivative is a nancial instrument whose value is based on that of anunderlying asset such as a stock, commodity, or currency. Derivatives providea way for investors to reduce the risks associated with investing. The mostcommon derivatives are forwards, futures, and options. In this chapter, weshow how the simple assumption of no-arbitrage may be used to derive fun-damental properties regarding the value of a derivative. In later chapters, weshow how the no-arbitrage principle leads to the notion of replicating portfolioand ultimately to the Cox-Ross-Rubinstein and Black-Scholes option pricingformulas.

A nancial market is a system by which are traded nitely many securitiesS1, S2, . . ., Sd. For ease and clarity of exposition we treat only the case d = 1.Thus, we assume that the market allows unlimited trades of a single riskysecurity S. With the exception of Sections 4.8, 9.6, and 14.7, we assume thatS pays no dividends.

The value of a share of S at time t will be denoted by St. The term riskyrefers to the unpredictable nature of S and hence suggests a probabilisticsetting for the model. We therefore assume that St is a random variable onsome probability space (Ω,F ,P).1 The set D of indices t is either a discreteset 0, 1, . . . , N or an interval [0, T ]. Since the initial value of the security isknown at time 0, S0 is assumed to be a constant. The collection S = (St)t∈Dis called the price process of S.

For reasons that will become clear, we also assume that there is availableto investors a risk-free money market account A that allows unlimited trans-actions in the form of deposits or withdrawals (including loans). As we saw inSection 1.3, this is equivalent to the availability of a risk-free bond B that maybe purchased or sold (short) in unlimited quantities. We follow the conventionthat borrowing an amount A is the same as lending (depositing) the amount−A, where A may be positive or negative.

If the price model is continuous, the risk-free asset is assumed to earn

1In this chapter we leave the probability space unspecied. Concrete models are devel-oped in later chapters.

43


interest compounded continuously at a constant annual rate r. If the modelis discrete, we assume compounding occurs at a rate i per time interval. Forexample, if B0 is the initial value of the bond, then the value Bt at time t isgiven by

Bt =

B0e

rt in a continuous model (0 ≤ t ≤ T in years),

B0(1 + i)t in a discrete model (t = 0, 1, . . . N).

These conventions apply throughout the book.

4.1 Arbitrage

An arbitrage is an investment opportunity that arises from mismatchednancial instruments. For example, suppose Bank A is willing to lend moneyat an annual rate of 8%, compounded monthly, and Bank B is oering CDsat an annual rate of 10%, also compounded monthly. An investor recognizesthis as an arbitrage opportunity, that is, a sure win. She simply borrows anamount from Bank A at 8% and immediately deposits it into Bank B at 10%.The transaction costs her nothing and results in a positive prot.

Clearly, a market cannot sustain such obvious instances of arbitrage. How-ever, more subtle examples occur, and while their existence may be eetingthey provide employment for market analysts whose job it is to discover them.(High-speed computers are commonly employed to ferret out and exploit arbi-trage opportunities.) Lack of arbitrage in a market, while an idealized condi-tion, is necessary for general economic stability. Moreover, as we shall see, theassumption of no-arbitrage leads to a robust mathematical theory of optionpricing.

To construct precise nancial models, one needs a mathematical denitionof arbitrage. For now, the following will suce.

Denition 4.1.1. An arbitrage is a trading strategy resulting in a portfoliowith zero probability of loss and positive probability of gain.2

A formal denition of portfolio is given in Chapter 5. For the time being,the reader may think of a portfolio as simply a collection of assets.

The following is the rst of several examples in the book showing how theassumption of no-arbitrage has concrete mathematical consequences.

Example 4.1.2. Suppose that the initial value of a single share of our security

2In games of chance, such as roulette, a casino has a statistical arbitrage (the so-calledhouse advantage). Here, in contrast to nancial arbitrage, the casino has a positive proba-bility of loss. However, the casino has a positive expected gain, so in the long run the housewins.

Options and Arbitrage 45

S is S0 and that after one time period its value goes up by a factor of u withprobability p or down by a factor of d with probability q = 1 − p, where0 < d < u and 0 < p < 1. In the absence of arbitrage, it must then be thecase that

d < 1 + i < u, (4.1)

where i is the interest rate during the time period.The verication of (4.1) uses the idea of selling a security short, that is,

borrowing and selling a security under the agreement that it will be returnedat some later date. (You are long in the security if you actually own it.) To seewhy (4.1) holds, suppose rst that u ≤ 1 + i. We then employ the followingtrading strategy: At time 0, we sell the stock short for S0 and invest the moneyin a risk-free account paying the rate i. This results in a portfolio consisting of−1 shares of S and an account worth S0. No wealth was required to constructthe portfolio. At time 1, the account is worth S0(1+i), which we use to buy thestock, returning it to the lender. This costs us nothing extra since the stockis worth at most uS0, which, by our assumption, is covered by the account.Furthermore, there is a positive probability q that our gain is the positiveamount (1 + i)S0 − dS0. The strategy therefore constitutes an arbitrage. If1 + i ≤ d we employ the reverse strategy: At time 0 we borrow S0 dollarsand use it to buy one share of S. We now have a portfolio consisting of oneshare of the stock and an account worth −S0, and as before, no wealth wasrequired to construct it. At time 1, we sell the stock and use the money to payback the loan. This leaves us with at least dS0 − (1 + i)S0 ≥ 0, and there isa positive probability p that our gain is the positive amount uS0 − (1 + i)S0,again implying an arbitrage. Therefore, (4.1) must hold.

The above example will play a fundamental role in the pricing of derivativesunder the binomial model (Chapter 7).

An important consequence of the assumption of no-arbitrage is the lawof one price, which asserts that in an arbitrage-free market two investmentsA and B with the same value at time T must have the same value at alltimes t < T . Indeed, if the time-t value of A were greater than that of B, onecould obtain a positive prot at time t by taking a short position in A and along position in B, and with the proceeds from selling B at time T cover theobligation from shorting A.

For the remainder of the chapter, we assume that the market admits noarbitrage opportunities. Also, for deniteness, we assume a continuous-timeprice process for an asset.


4.2 Classication of Derivatives

In the sections that follow, we examine various common types of deriva-tives. As mentioned above, a derivative is a nancial contract whose valuedepends on that of another asset S, called the underlying. The expirationtime of the contract is called the maturity or expiration date and is denotedby T . The price process of the underlying is given by S = (St)

Tt=0.

Derivatives fall into four main categories:

European,

American,

path independent, and

path dependent.

A European derivative is a contract that the holder may exercise only atmaturity T . By contrast, an American derivative may be exercised at anytime τ ≤ T . A path-independent derivative has payo (that is, value) atexercise time τ which depends only on Sτ , while a path-dependent derivativehas payo at τ which depends on the entire history S[0,τ ] of the price process.

We begin our analysis with the simplest types of European path indepen-dent derivatives: forwards and futures.

4.3 Forwards

A forward contract between two parties places one party under the obliga-tion to buy an asset at a future time T for a prescribed price K, the forwardprice, and requires the other party to sell the asset under those conditions.The party that agrees to buy the asset is said to assume the long position,while the party that will sell the asset has the short position. The payo forthe party in the long position is ST −K, while that for the party in the shortposition is K − ST . The forward price K is set so that there is no cost toeither party to enter the contract. (This is in contrast to options, as we shallsee below.)

The following examples illustrate how forwards may be used to hedgeagainst unfavorable changes in commodity prices.

Example 4.3.1. A farmer expects to harvest and sell his wheat crop sixmonths from now. The price of wheat is currently $8.70 per bushel andhe predicts a crop of 10,000 bushels. He calculates that he would make


an adequate prot even if the price dropped to $8.50. To hedge against alarger drop in price, he takes the short position in a forward contract withK = $8.50× 10, 000 = $85, 000. At time T (six months from now) he is underthe obligation to sell his wheat to the party in the long position for $8.50 perbushel. His payo is the dierence between the forward price K and the priceof wheat at maturity. Thus, if wheat drops to $8.25, his payo is $.25 perbushel; if it rises to $8.75, his payo is −$.25 per bushel.

Example 4.3.2. An airline company needs to buy 100,000 gallons of jetfuel in three months. It can buy the fuel now at $4.80 per gallon and paystorage costs, or wait and buy the fuel three months from now. The companydecides on the latter strategy, and to hedge against a large increase in thecost of jet fuel, it takes the long position in a forward contract with K =$4.90× 100, 000 = $490, 000. In three months, the airline is obligated to buy,and the fuel company to sell, 100,000 gallons of jet fuel for $4.90 a gallon.The company's payo is the dierence between the price of fuel then and thestrike price. If in three months the price rises to, say, $4.96 per gallon, thenthe company's payo is $.06 per gallon. If it falls to $4.83, its payo is −$.07per gallon.

Since there is no cost to enter a forward contract, the initial value F0 of theforward is zero. However, as time passes the forward may acquire a non-zerovalue. Indeed, the no-arbitrage assumption implies that the (long position)value Ft of a forward at time t is given by

Ft = St − e−r(T−t)K, 0 ≤ t ≤ T. (4.2)

This is clear for t = T . Suppose, however, that at some time t < T theinequality Ft < St − e−r(T−t)K holds. We then take a short position onthe security and a long position on the forward. This provides us with cashSt − Ft > 0, which we deposit in a risk-free account at rate r. At maturity,we discharge our obligation to buy the security for the amount K, returningit to the lender. Since our account has grown to (St − Ft)e

r(T−t), we nowhave cash in the amount (St − Ft)er(T−t) − K. As this amount is positive,our strategy constitutes an arbitrage. If Ft > St − e−r(T−t)K, we employ thereverse strategy, taking a short position on the forward and a long positionon the security. This requires cash in the amount St−Ft, which we borrow atthe rate r. At time T , we discharge our obligation to sell the security for Kand use this to settle our debt. This leaves us with the positive cash amountK − (St − Ft)er(T−t), again implying an arbitrage and hence verifying (4.2).

One can also obtain (4.2) using the the law of one price, one investmentconsisting of a long position in the forward, the other consisting of a longposition in the stock and a short position in a bond with face value K. Theinvestments have the same value at maturity ((4.2) with t = T ) and thereforemust have the same value for all t ≤ T .

Setting t = 0 in (4.2) and solving the resulting equation F0 = 0 for K, we


see thatK = S0e

rT . (4.3)

Substituting (4.3) into (4.2) yields the alternate formula

Ft = St − ertS0, 0 ≤ t ≤ T. (4.4)

4.4 Currency Forwards

Currencies are traded over the counter on the foreign exchange market(FX), which determines the relative values of the currencies. The main pur-pose of the FX is to facilitate trade and investment so that business amonginternational institutions may be conducted eciently with minimal regard tocurrency issues. The FX also supports currency speculation, typically throughhedge funds. In this case, currencies are bought and sold without the tradersactually taking possession of the currency.

An exchange rate species the value of one currency relative to another,as expressed, for example, in the equation 1 euro = 1.44226 US dollars. Likestock prices, FX rates can be volatile. Rates may be inuenced by variouseconomic factors, including government budget decits or surpluses, balanceof trade levels, ination levels, political conditions, and market perceptions.Because of this volatility there is a risk associated with currency trading andtherefore a need for currency derivatives.

A forward contract whose underlying is a foreign currency is called a cur-rency forward. Consider a currency forward that allows the purchase in USdollars of one euro at time T . Let Kt denote forward price of the euro es-tablished at time t and suppose that the time-t rate of exchange is 1 euro= Qt dollars. We let rd and re denote, respectively, the dollar and euro inter-est rates. To establish a formula relating Kt and Qt, consider the followingpossible investment strategies made at time t:

Enter into the forward contract and deposit Kte−rd(T−t) dollars in a

US bank. At time T , the value of the account is Kt, which is used topurchase the euro.

Buy e−re(T−t) euros for e−re(T−t)Qt dollars and deposit the euro amountin a European bank. At time T , the account has exactly one euro.

As both strategies yield the same amount at maturity, the law of one priceensures that they must have the same value at time t, that is,

Kte−rd(T−t) = Qte

−re(T−t).

Solving for Kt, we see that the proper time-t forward price of a euro is

Kt = Qte(rd−re)(T−t).


In particular, the forward price at time zero is

K = K0 = Q0e(rd−re)T .

This expression should be contrasted with the forward price S0erdt of a stock.

In the latter case, only one instrument, the dollar account, makes payments.In the case of a currency, both accounts make payments.

4.5 Futures

A futures contract, like a forward, is an agreement between two partiesto buy or sell a specied asset for a certain price at a certain time in thefuture. There are important dierences, however. Unlike forward contracts,future contracts are usually traded on exchanges rather than negotiated bythe parties. Also, a futures price, unlike a forward price, is negotiated daily,and the daily dierences are received by the long holder of the contract. Theprocess is implemented by brokers via margin accounts, which have the eectof protecting the parties against default.

Example 4.5.1. Suppose the farmer in Example 4.3.1 takes the short positionon a futures contract on day 0. On each day j, a futures price Fj for wheatwith delivery date T = 180 days is quoted. The price depends on the currentprospects for a good crop, the expected demand for wheat, and so forth. Thelong holder of the contract (the wheat buyer) receives F1 − F0 on day one,F2−F1 on day two, and so on until delivery day T , when he receives FT−FT−1,where FT is the spot price of wheat that day.3 Some of these amounts maybe negative, in which case a payout is required. The total amount received bythe buyer is

T∑j=1

(Fj − Fj−1) = FT − F0.

On day T , the buyer has cash in the amount of FT −F0, pays FT , and receiveshis wheat. The net cost to him is F0. Since this amount is known on day 0,F0 acts like a forward price. The dierence is that the payo FT − F0 is paidgradually rather than at delivery.

It may be shown that under the assumption of constant interest rates, afutures price and a forward price are the same. (See, for example, [7].)

3The spot price of a commodity is its price ready for delivery.


4.6 Options

Options are derivatives similar to forwards and futures but have the addi-tional feature of limiting an investor's loss to the cost of the option. Speci-cally, an option is a contract between two parties, the holder and the writer,which gives the former the right, but not the obligation, to buy or sell a partic-ular security under terms specied in the contract. An option has value, sincethe holder is under no obligation to exercise the contract and could gain fromthe transaction if she does so. The holder of the option is said have the longposition, while the writer of the option has the short position. Each of the twobasic types of options, the call option and the put option, comes in two styles,American and European. We begin with the denition of the European calloption.

A European call option is a contract with the following conditions: At aprescribed time T , the holder (that is, buyer) of the option may purchase aprescribed security S for a prescribed amount K, the strike price. For theholder, the contract is a right, not an obligation. On the other hand, thewriter (seller) of the option does have a potential obligation: he must sell theasset if the holder chooses to exercise the option. Since the holder has a rightwith no obligation, the option has a value and therefore a price, called thepremium. The premium must be paid by the holder at the time of openingof the contract. Conversely, the writer, having assumed a nancial obligation,must be compensated.

To nd the payo of the option at maturity T , we argue as follows: IfST > K, the holder will exercise the option and receive the payo ST − K.On the other hand, if ST ≤ K, the holder will decline to exercise the option,since otherwise he would be paying K − ST more than the security is worth.The payo for the holder of the call option may therefore be expressed as(ST −K)+, where, for a real number x,

x+ := max(x, 0).

The option is said to be in the money at time t if St > K, at the money ifSt = K, and out of the money if St < K.

A European put option has a denition analogous to that of a call option:it is a contract that allows the holder, at a prescribed time T in the future, tosell an asset for a prescribed amount K. The holder is under no obligation toexercise the option, but if she does so the writer must buy the asset. Whereasthe holder of a call option is betting that the asset price will rise (the wagerbeing the premium), the holder of a put option is counting on the asset pricefalling in the hopes of buying low and selling high.

An argument similar to the call option case shows that the payo of aEuropean put option at maturity is (K − ST )+. Here the option is in the


money at time t if St < K, at the money if St = K, and out of the money ifSt > K.

Figure 4.1 shows the payos, graphed against ST , for the holder of a calland the holder of a put. The writer's payo is the negative of the holder'spayo: the transaction is a zero-sum game.

STK

K

STK

Call Payoff Put Payoff

FIGURE 4.1: Call and Put Option Payos

Options, like forwards and futures, may be used to hedge against priceuctuations. For instance, the farmer in Example 4.5.1 could buy a put optionthat guarantees a priceK for his harvest in six months. If the price drops belowK, he would exercise the option. Similarly, the airline company in Example4.3.2 could take the long position in a call option that gives the company theright to buy fuel at a pre-established price K.

Options may also be used for speculation. A third party in the airlineexample might take the long position in a call option, hoping that the priceof fuel will go up. Of course, in contrast to forwards and futures, option-basedhedging and speculation strategies have a cost, namely, the price of the option.The determination of that price is a primary goal of this book.

Note that, while the holder of the option has only the price of the option tolose, the writer stands to take a signicantly greater loss. To oset this loss, thewriter could take the money received for the option to start a portfolio withmaturity value sucient to settle the claim of the holder. Such a portfolio iscalled a hedging strategy. For example, the writer of a call option could take thelong position in one share of the security. This requires borrowing an amountc, the price of the security minus the income received from selling the call. Attime T , the writer's net prot is ST − cerT , the value of the security less theloan repayment. If the option is exercised, the writer can use the portfolio tosettle his obligation of ST −K.4 The writer has successfully hedged the shortposition of the call. In the next chapter, we consider dynamic hedges withtime-dependent units of the security and a risk-free bond.

Put and call options may be used in combinations to achieve a variety ofpayos and hedging eects. For example, consider a portfolio that consists ofa long position in a call option with strike price K1 and a short position in a

4That ST − cerT ≥ ST − K is a consequence of the put-call parity formula, discussedbelow.


K1 K2

K2 −K1

ST

Payoff

FIGURE 4.2: Bull Spread Payo

call option with strike price K2 > K1, each with underlying S and maturityT . Such a portfolio is called a bull spread. Its payo (ST −K1)+− (ST −K2)+

gains in value as the stock goes up. Reversing positions in the calls producesa bear spread, which benets from a decrease in stock value. Figure 4.2 showsthe payo of bull spread graphed against ST .

Of more relevance to the investor than the payo is the prot of a portfo-lio, that is, the payo minus the cost of starting the portfolio. Consider, forexample, the prot from a straddle, which is a portfolio with long positionsin a call and a put with the same strike price, maturity, and underlying. Thestart-up cost of the portfolio is the combined cost c of the call and the put,hence the prot is (ST −K)+ + (K − ST )+ − c = |ST −K| − c, graphed inFigure 4.3. The straddle is seen to benet from a movement in either directionaway from the strike price.

KST

K − c

−c

Profit

FIGURE 4.3: Straddle Prot

So far we have considered only European options, characterized by thefact that the contracts may be exercised only at maturity. By contrast, Amer-ican options may be exercised at any time up to and including the expirationdate. American options are more dicult to analyze as there is the additionalcomplexity introduced by the holder's desire to nd the optimal time to ex-ercise the option and the writer's need to construct a hedging portfolio thatwill cover the claim at any time t ≤ T . Nevertheless, as we shall see, some


properties of American options may be readily deduced from those of theirEuropean counterparts. We consider American options in Chapters 9 and 14.

In recent years, an assortment of more complex derivatives has appeared.These are frequently called exotic options and are distinguished by the factthat their payos are no longer simple functions of the value of the underlyingat maturity. Prominent among these are the path-dependent options, whosepayos depend not only on the value of the underlying at maturity but alsoon earlier values as well. The main types of path-dependent options are Asianoptions, lookback options, and barrier options. The payo of an Asian optiondepends on an average of the values of S, while that of a lookback option isa function of the maximum or minimum values of S. A barrier option haspayo that depends on whether St crosses a prescribed level. These optionsare examined in detail in Chapter 14.

We end this section with a brief discussion of swaps and swaptions. A swapis a contract between two parties to exchange nancial assets or cash streams(payment sequences) at some specied time in the future. The most common ofthese are currency swaps, exchanges of one currency for another, and interest-rate swaps, where a cash stream with a xed interest rate is exchanged for onewith a oating rate. A swaption is an option on a swap. It gives the holderthe right to enter into a swap at some future time. A detailed analysis ofswaps and swaptions may be found in [7]. A credit default swap (CDS) is acontract that allows one party to make a series of payments to another partyin exchange for a future payo if a specied loan or bond defaults. A CDS is,in eect, insurance against default but is also used for hedging purposes andby speculators, as was famously the case in the subprime mortgage crisis of2007 (see [11]).

4.7 Properties of Options

The options considered in this section are assumed to have maturity T ,strike priceK and underlying one share of S with price process (St)

Tt=0. We de-

note the cost of a European (resp., American) call option by Ce0 (respectively,Ca0 ) and that of a European (resp., American) put option by P e0 (respectively,P a0 )

The following proposition asserts that an American call option, despite itsgreater exibility, has the same value as that of a comparable European calloption.

Proposition 4.7.1. It is never advantageous to exercise an American calloption early. In particular, Ce0 = Ca0 .

Proof. Clearly Ca0 ≥ Ce0 , and the only way to have Ca0 > Ce0 is for there to be


an advantage to exercise the American option early. We will show that this isnot the case.

Suppose an investor holds an American option to buy one share of S attime T for the price K. If she exercises the option at some time t < T ,she will immediately realize the payo St − K, which, if invested in a risk-free account, will yield the amount er(T−t)(St −K) at maturity. But supposeinstead that she sells the stock short for the amount St, invests the proceeds,and then purchases the stock at time T (returning it to the lender). If ST ≤ K,she will pay the market price ST ; if ST > K, she will exercise the optionand pay the amount K. Under this strategy, she will therefore have casher(T−t)St − min(ST ,K) at time T . Since this amount is at least as large aser(T−t)(St − K), the second strategy is generally superior to the rst. Sincethe second strategy did not require that the option be exercised early, thereis no advantage in doing so.

Hereafter, we denote the cost of either a European or American call optionby C0. The next result uses an arbitrage argument to obtain an importantconnection between P e0 and C0.

Proposition 4.7.2 (Put-Call Parity Formula). Consider a call option (eitherEuropean or American) and a European put option, each with strike price K,maturity T , and underlying one share of S. Then

S0 + P e0 − C0 = Ke−rT . (4.5)

Proof. We show that the assumption S0 + P e0 − C0 6= Ke−rT leads to anarbitrage. Suppose rst that S0 + P e0 − C0 < Ke−rT . We then buy one shareof S, buy one put option, and sell one call option. This costs us S0 +P e0 −C0,which we borrow and repay in the amount (S0 + P e0 − C0)erT at maturity.If ST ≤ K, then the call option we sold will not be exercised, but we canexercise our put option and sell the security for K. If ST > K, the put optionis worthless but the call option we sold will be exercised, requiring us to sellthe security for K. In either case, we will realize the cash amount K. Since ourassumption implies that K > (S0 +P e −C)erT , the amount received exceedsthe amount owed and our strategy constitutes an arbitrage.

If S0+P e0−C0 > Ke−rT , we use the reverse strategy: sell short one share ofthe security, sell one put option, and buy one call option. An argument similarto that in the rst paragraph shows that this strategy is also an arbitrage.

One can also establish (4.5) by using the law of one price. Consider twoportfolios, one consisting of a long holding of the put, the other consisting ofa long holding of the call, a bond with face value K and maturity T , and ashort position on one share of the security. The portfolios have initial valuesP e0 and C0 + e−rTK − S0, respectively. Since the nal values (K − ST )+ and(ST −K)+ + K − ST are equal, the initial values must also be equal, givingP e0 = C0 +B0 − S0, which is (4.5).

Proposition 4.7.2 shows that the price of a European put option may be


expressed in terms of the price C0 of a call. To nd C0, one uses the notion ofself-nancing portfolio, described in the next chapter. We shall see later thatC0 depends on a number of factors, including

the initial price of S (a large S0 suggests that a large ST is likely);

the strike price (the smaller the value of K the greater prot for theholder);

the volatility of S;

the expiration date;

the interest rate (which aects the discounted value of the strike price).

The exact quantitative dependence of C0 on these factors will be examined indetail in the context of the Black-Scholes-Merton model in Chapter 11.

4.8 Dividend-Paying Stocks

In the foregoing, we have assumed that the underlying asset of an optionpays no dividends. In this section, we illustrate how dividends can aect op-tion price properties by proving a version of the put-call parity formula fordividend-paying stocks.

We begin by observing that when a stock pays a dividend the stock's valueis immediately reduced by the amount of the dividend. Indeed, suppose thestock pays a dividend D1 at time t1 > 0. If the stock's value x immediatelyafter t1 were greater than St1 − D1, an arbitrageur could buy the stock forSt1 , get the dividend, and then sell the stock for x, realizing a prot of D1 +x − St1 > 0. On the other hand, if x were less than St1 − D1 she could sellthe stock short for St1 , buy the stock immediately after t1 for x, and returnit along with the dividend (as is required) for a prot of St1 − x−D1 > 0.

To derive a put-call parity formula for a dividend-paying stock, we assumethat a dividend Dj is paid at time tj , where 0 < t1 < t2 < · · · < tn ≤ T . LetD denote the present value of the dividend stream:

D := e−rt1D1 + e−rt2D2 + · · ·+ e−rtnDn.

Consider two portfolios, one consisting of a long holding of a put with maturityT , strike priceK, and underlying one share of the stock; the other consisting ofa long holding of the corresponding call, a bond with face value K and matu-rity T , bonds with face values Dn and maturity times tn, n = 1, 2, . . . , N , anda short position on one share of the stock. The initial values of the portfoliosare P e0 and C0 +Ke−rT +D− S0, respectively. At maturity, the value of the


rst portfolio is (K − ST )+. Assuming that the dividends are deposited intoan account and recalling that in the short position the stock as well as thedividends with interest must be returned, we see that the value of the secondportfolio at maturity is

(ST −K)+ +K +

N∑n=1

Dner(T−tn) −

(ST +

N∑n=1

Dner(T−tn)

)= (K − ST )+.

Since the portfolios have the same value at maturity, the law of one priceensures that the portfolios have the same initial value. It follows that

S0 + P e0 − C0 = Ke−rT +D,

which is the put-call parity formula for dividend-paying stocks. Note that theformula reduces to the non-dividend-paying version if each Dn = 0.


4.9 Exercises

1. Show that S0 −Ke−rT ≤ C0 ≤ S0 and P e0 ≤ Ke−rT .

2. (Generalization of Exercise 1.) Show that C0 + e−rT min(ST ,K) ≤ S0

and erTP e0 + min(ST ,K) ≤ K.

3. Complete the proof of Proposition 4.7.2 by showing that the assumptionS0 + P e0 − C0 > Ke−rT leads to an arbitrage.

4. Consider two call options C and C′ with the same underlying, the samestrike price, and with prices C0 and C

′0, respectively. Suppose that C has

maturity T and C′ has maturity T ′ < T . Explain why C ′0 ≤ C0.

5. Consider two American put options P and P ′ with the same underlying,the same strike priceK, and with prices P0 and P

′0, respectively. Suppose

that P has maturity T and that P ′ has maturity T ′ < T . Explain whyP ′0 ≤ P0.

6. Graph the payos against ST of a portfolio that is (a) long in a stockS and short in a call on S; (b) short in S and long in a call on S; (c)long in S and short in a put on S; (d) short in S and long in a puton S. Assume the calls and puts are European with strike price K andmaturity T .

7. Graph the payos against ST for an investor who (a) buys one call andtwo puts; (b) buys two calls and one put. Assume the calls and putsare European with strike price K, maturity T and underlying S. (Theportfolios in (a) and (b) are called, respectively, a strip and a strap.)What can you infer about the strategy underlying each portfolio?

8. Let 0 < K1 < K2. Graph the payo against ST for an investor who(a) holds one call with strike price K1 and writes one put with strikeprice K2; (b) holds one put with strike price K1 and writes one call withstrike price K2. Assume that the options are European and have thesame underlying and maturity T .

9. Graph the payo against ST for an investor who holds one put withstrike price K1 and one call with strike price K2 > K1. Assume that theoptions are European and have the same underlying and same expirationdate T . (The portfolio in this exercise is known as a strangle.)

10. Consider two call options with the same underlying and same maturityT , one with price C0 and strike price K, the other with price C ′0 andstrike price K ′ > K. Give a careful arbitrage argument to show thatC0 ≥ C ′0.


11. Consider two European put options with the same underlying and samematurity T , one with price P0 and strike price K, the other with priceP ′0 and strike price K ′ > K. Give a careful arbitrage argument to showthat P0 ≤ P ′0.

12. Let C ′0 and C ′′0 denote the costs of call options with strike prices K ′

and K ′′, respectively, where 0 < K ′ < K ′′, and let C0 be the cost of acall option with price K := (K ′ + K ′′)/2. If all options have the samematurity and underlying, show that C0 ≤ (C ′0 + C ′′0 )/2.

Hint: Assume that C0 > (C ′0 + C ′′0 )/2. Write two options with strikeprice K, buy one with strike price K ′ and another with strike price K ′′,giving you cash in the amount 2C0 − C ′0 − C ′′0 > 0. Consider the casesobtained by comparing ST with K, K ′, and K ′′.

The portfolio described in the hint is called a buttery spread. Assumingthe portfolio has a positive cost c := C ′0 + C ′′0 − 2C0, graph the protfunction of the portfolio.

13. Referring to Exercises 10 and 11, show that

max(P ′0 − P0, C0 − C ′0) ≤ (K ′ −K)e−rT .

14. A capped option is a standard option with prot capped at apre-established amount A. The payo of a capped call option ismin ((ST −K)+, A). Which of the following portfolios has time-t valuethe same as that of a capped call option: strip, strap, straddle, strangle,bull spread, or bear spread?

15. Show that a bull spread can be created from a combination of positionsin put options and a bond.

Chapter 5

Discrete-Time Portfolio Processes

The notion of self-nancing, replicating portfolio is a key component of optionvaluation models. A portfolio is an example of a stochastic process, that is, arandom variable changing with time. We begin with a brief discussion of thisimportant notion.

5.1 Discrete-Time Stochastic Processes.

Experiments consisting of a sequence of trials may be viewed dynamically,that is, changing in time. If the outcome of the nth trial (the outcome at timen) is described by a random variable, then the resulting sequence of randomvariables provides a mathematical model of the experiment. This idea leadsto the formal notion of stochastic process.

Denition 5.1.1. A (discrete-time) stochastic (or random) process on a prob-ability space is a nite or innite sequence X = (Xn) = (Xn)n≥0 of randomvariables Xn.

1 If each Xn is a constant, then the process X is said to be de-terministic. If X1, X2, . . . , Xd are stochastic processes and Xn is the randomvector (X1

n, X2n, . . . , X

dn), then (Xn)n≥0 is called a d-dimensional stochastic

process.

Example 5.1.2. Let Sn denote the price of a stock on day n. On day 0,the price of the stock is known but future prices are not and therefore areusually taken to be random variables. The sequence S = (Sn) is a stochasticprocess that models the price movement of the stock. A portfolio consistingof d stocks gives rise to a d-dimensional stochastic process (S1, S2, . . . , Sd),where Sjn denotes the price of stock j on day n.

As noted above, a stochastic process may be thought of as a mathematicaldescription of an experiment evolving in time. A related concept is the evolu-tion or ow of information revealed during the experiment. This is describedmathematically by a ltration.

1The sequence may begin at indices other than 0.

59


Denition 5.1.3. A (discrete-time) ltration on a probability space (Ω,F ,P)is a nite or innite sequence (Fn) = (Fn)n≥0 of σ-elds with

F0 ⊆ F1 ⊆ · · · ⊆ Fn ⊆ · · · ⊆ F .

A stochastic process (Xn) is said to be adapted to a ltration (Fn) if for each nthe random variable Xn is Fn-measurable. A d-dimensional stochastic processis adapted to a ltration if each component process is adapted.

A ltration (Fn) may be thought of as representing the accumulation ofinformation over time. If (Xn) is adapted to the ltration, then Fn includesall relevant information about Xn. The following example should illustratethis idea.

Example 5.1.4. A coin is tossed N times and the outcomes heads H ortails T are observed. The sample space consists of all sequences ω1ω2 . . . ωN ,where ωn = H or T . For each xed sequence ω = ω1ω2 . . . ωn of H's andT 's appearing in the rst n tosses, let Aω denote the set of all sequencesof the form ων, where ν = ωn+1ωn+2 . . . ωN is a sequence of H's and T 'srepresenting the outcomes of the last N − n tosses. For example, if N =4, then ATH = THHH,THHT, THTH, THTT. We show that the setsAω generate a ltration that describes the ow of information during theexperiment.

Before the rst toss, we have no knowledge of the outcomes of the experi-ment. The σ-eld F0 corresponding to this void of information is F0 = ∅,Ω.After the rst toss, we know whether the outcome was H or T but have noinformation regarding subsequent tosses. The σ-eld F1 describing the infor-mation gained on the rst toss is ∅,Ω, AH , AT . After the second toss, weknow which of the outcomes ω1ω2 = HH, HT , TH, TT has occurred, but wehave no knowledge about the impending third toss. The σ-eld F2 describingthe information gained on the rst two tosses consists of ∅, Ω, and all unions ofthe sets Aω1ω2

. Continuing in this way, we obtain a ltration (Fn)Nn=0, whereFn (n ≥ 1) is the σ-eld consisting of all unions of (that is, generated by)the pairwise disjoint sets Aω (see Example 2.3.2). Note that after N tosseswe have complete information, demonstrated by the fact that FN contains allsubsets of Ω.

Now let Xn denote the number of heads appearing in the rst n tosses. Thestochastic process (Xn) is easily seen to be adapted to (Fn). For example, theevent X4 = 2 is the union of AHHTT , AHTHT , AHTTH , ATTHH , ATHTH ,and ATHHT and hence is a member of F4. Conversely, knowing that the eventAHTHT has occurred tells us that X4 = 2 (and that X1 = X2 = 1 andX3 = 2).

Filtrations, such as that of Example 5.1.4, which contain only the infor-mation generated by a random process are of sucient importance to warrantspecial terminology and notation.

Discrete-Time Portfolio Processes 61

Denition 5.1.5. Let (Xn)n≥0 be a stochastic process. For each n, denote by

FXn = σ(Xj : j ≤ n)

the σ-eld generated by all events of the form Xj ∈ J, where j ≤ n and Jis an arbitrary interval of real numbers. (FXn ) is called the natural ltrationfor (Xn) or the ltration generated by the process (Xn). It is the smallestltration to which (Xn) is adapted.

A concept related to adaptability is predictability, dened as follows.

Denition 5.1.6. A stochastic process X = (Xn) is said to be predictablewith respect to a ltration (Fn) if Xn is Fn−1 measurable for each n ≥ 1. Ad-dimensional stochastic process is predictable if each component process ispredictable.

Every predictable process is adapted, but not conversely. The dierence maybe understood as follows: For an adapted process, Xn is determined by eventsin Fn. For a predictable process, it is possible to determine Xn by events inFn−1.

Example 5.1.7. Marbles are randomly drawn one at a time without replace-ment from a jar initially containing red and white marbles of equal number.Before each draw a player places a bet on the outcome red or white. By keep-ing track of the number of marbles of each color in the jar after each draw,the player may make an informed decision as to the size of the wager. Forexample, if the rst n− 1 draws resulted in more red marbles than white, thegambler should bet on white for the nth draw, the size of the bet determinedby the ratio of white to red marbles left in the jar. As in the coin toss example,the ow of information in this example may be modeled by a ltration (Fn).Because the gambler is not prescient, the nth wager Xn must be determinedby events in Fn−1. Therefore, the wager process (Xn) is predictable. On theother hand, if Yn denotes the number of red balls in the rst n draws, then(Yn) is adapted to the ltration but is not predictable.

5.2 Self-Financing Portfolios

Suppose that the price of a security S is given by a stochastic process S =(Sn)Nn=0 on a probability space (Ω,F ,P), where S0 is constant. We may assumethat F = FSN , where (FSn ) is the natural ltration for S. Note that becauseS0 is constant FS0 is the trivial σ-eld ∅,Ω. The ltration (FSn ) models theaccumulation of stock price information in the discrete time interval [0, N ].

Let B denote a risk-free bond earning compound interest at the rate i perperiod. We assume that the market allows unlimited transactions in S and B.


The value of the bond at time 0 is taken to be 1 unit, so that the value ofthe security is measured in terms of the initial value of the bond. (Later weintroduce the notion of discounted price process, which measures the value ofthe security against the current value of the bond.) In contrast to S, the priceprocess B = (Bn)Nn=0 of B is deterministic; indeed, Bn = (1 + i)n.

Denition 5.2.1. A portfolio or trading strategy for (B,S) is a two-

dimensional predictable stochastic process (φ, θ) = ((φn, θn))Nn=1 on (Ω,F ,P),

where φn and θn denote, respectively, the number of units of B and shares ofS held at time n. The value Vn of the portfolio at time n is dened as

V0 = φ1 + θ1S0, Vn = φnBn + θnSn, n = 1, 2, . . . , N.

The stochastic process V = (Vn)Nn=0 is called the value or wealth process ofthe portfolio, and V0 is the initial investment or initial wealth.

The idea behind a trading strategy is this: At time 0, the number of unitsφ1 of B and shares θ1 of S are chosen to satisfy the initial wealth equation ofDenition 5.2.1. These are constants, as the value S0 is known. At time n ≥ 1,the value of the portfolio before the price Sn is known (and before the bond'snew value is noted) is

φnBn−1 + θnSn−1.

When Sn becomes known the portfolio has value

φnBn + θnSn. (5.1)

At this time, the number of shares of S and units of B may be adjusted, thestrategy based on the price history S0, S1, . . ., Sn of the stock. Predictability ofthe portfolio process is the mathematical property underlying this procedure;the new values φn+1 and θn+1 are determined using only information providedby FSn . After readjustment, the value of the portfolio during the time interval(n, n+ 1) is

φn+1Bn + θn+1Sn. (5.2)

At time n+ 1, the process is repeated.Now suppose that each readjustment of the portfolio is accomplished with-

out changing its current value, that is, without the removal or infusion ofwealth. Shares of S and units of B may be bought or sold, but the net valueof the transactions is zero. Mathematically, this simply means that the quan-tities in (5.1) and (5.2) are equal. This is the notion of self-nancing portfolio.Using the notation

∆xn := xn+1 − xn,we may express this idea formally as follows.

Denition 5.2.2. A portfolio (φ, θ) is said to be self-nancing if

Bn∆φn + Sn∆θn = 0, n = 1, 2, . . . , N − 1. (5.3)


In Theorem 5.2.5 below, we give several alternate ways of characterizing aself-nancing portfolio. One of these uses the idea of a discounted process.

Denition 5.2.3. Let X = (Xn)Nn=0 be a stochastic process. The discountedprocess X is dened by

Xn = (1 + i)−nXn, n = 0, 1, . . . , N.

Remark 5.2.4. The process X measures the current value of X in terms ofthe current value of the bond B. In this context, the bond process is said tobe a numeraire. Converting from one measure of value to another is referredto as a change of numeraire.

Theorem 5.2.5. For a trading strategy (φ, θ) with value process V , the fol-lowing statements are equivalent:

(i) (φ, θ) is self-nancing;

(ii) ∆Vn = φn+1∆Bn + θn+1∆Sn, n = 0, 1, . . . N − 1;

(iii) V satises the recursion equations

Vn+1 = θn+1 [Sn+1 − (1 + i)Sn] + (1 + i)Vn

= θn+1Sn+1 + (1 + i)[Vn − θn+1Sn], n = 0, 1, . . . , N − 1;

(iv) ∆Vn = θn+1∆Sn, n = 0, 1, . . . , N − 1;

(v) φn = V0 −∑n−1j=0 Sj∆θj, n = 1, 2, . . . , N , where θ0 := 0.

Proof. For n = 0, 1, . . . , N − 1, let

Yn = Bn∆φn + Sn∆θn = φn+1Bn + θn+1Sn − Vn.

Then Vn = φn+1Bn + θn+1Sn − Yn, hence

∆Vn = φn+1Bn+1 + θn+1Sn+1 − (φn+1Bn + θn+1Sn − Yn)

= φn+1∆Bn + θn+1∆Sn + Yn.

Since the portfolio is self-nancing i Yn = 0 for all n, (i) and (ii) are seen tobe equivalent.

From the equations φn+1Bn = Yn + Vn − θn+1Sn and Bn+1 = (1 + i)Bn,we have

Vn+1 = φn+1Bn+1 + θn+1Sn+1

= (1 + i) [Yn + Vn − θn+1Sn] + θn+1Sn+1

= θn+1 [Sn+1 − (1 + i)Sn] + (1 + i)(Vn + Yn).

The last equation shows that (i) and (iii) are equivalent. That (iii) and (iv)are equivalent follows immediately from the denition of discounted process.


To show that (i) and (v) are equivalent, assume rst that (φ, θ) is self-nancing. By (5.3),

Bj∆φj = −Sj∆θj , j = 1, 2, . . . , N − 1.

Multiplying by B−1j = (1 + i)−j and summing, we have

n∑j=1

∆φj = −n∑j=1

Sj∆θj , n = 1, 2, . . . , N − 1.

The left side of this equation collapses to φn+1 − φ1 hence

φn+1 = φ1 −n∑j=1

Sj∆θj , n = 1, 2, . . . , N − 1.

Finally, noting that φ1 = V0 − θ1S0 = V0 − S0∆θ0 (recalling that θ0 = 0), wehave

φn+1 = V0 −n∑j=0

Sj∆θj , n = 0, 1, . . . , N − 1,

which is (v). Since the steps in this argument may be reversed, (i) and (v) areequivalent.

Corollary 5.2.6. Given a predictable process θ and initial wealth V0, thereexists a unique predictable process φ such that the trading strategy (φ, θ) isself-nancing.

Proof. The process φ given in part (v) of the theorem is clearly predictable.

Remarks 5.2.7. The quantity Vn − θn+1Sn in part (iii) of Theorem 5.2.5 isthe cash left over from the transaction of buying θn+1 shares of the stock attime n and therefore represents the value of the bond account. Thus, the newvalue Vn+1 of the portfolio results precisely from the change in the value ofthe stock and the growth of the bond account over the time interval (n, n+1).

Part (iv) implies that θ may be expressed uniquely in terms of V and S,and part (v) asserts that the process φ is completely determined by θ and theinitial wealth V0. It follows that, in the self-nancing case, the trading strategy(φ, θ) may be determined uniquely from the value process. (See Exercise 4 inthis regard.)

5.3 Option Valuation by Portfolios

Self-nancing portfolios may be used to establish the fair value of a deriva-tive. To describe the method with sucient generality, we make the followingdenition.


Denition 5.3.1. A contingent claim is an FSN -random variable H. A hedg-ing strategy or hedge for H is a self-nancing trading strategy with valueprocess V satisfying VN = H. If a hedge for H exists, then H is said to beattainable and the hedge is called a replicating portfolio. A market is completeif every contingent claim is attainable.

European options are the most common examples of contingent claims.The holder of the option has a claim against the writer, namely, the value(payo) of the option at maturity. In the case of a call option, that claimis (SN −K)+; for a put option the claim is (K − SN )+. Both are obviouslyFSN -random variables.

Recall that an arbitrage is a trading strategy with zero probability of lossand positive probability of net gain. We may now give a precise denition interms of the value process of a portfolio.

Denition 5.3.2. An arbitrage is a trading strategy (φ, θ) whose value processV satises P(VN ≥ V0) = 1 and P(VN > V0) > 0.

To see the implications of completeness, suppose that we write a Europeancontract with payo H in a complete and arbitrage-free market. At maturitywe are obligated to cover the claim, which, by assumption, is attainable by aself-nancing portfolio with value process V . Our strategy is to sell the con-tract for V0 and use this amount to start the portfolio. At time N , our portfoliohas value VN , which we use to cover our obligation. The entire transactioncosts us nothing since the portfolio is self-nancing; we have hedged the shortposition of the contract. It is natural then to dene the time-n value of thecontract to be Vn; any other value would result in an arbitrage. (This is an-other instance of the law of one price.) We summarize this discussion in thefollowing theorem.

Theorem 5.3.3. In a complete and arbitrage-free market, the time-n value ofa European contingent claim H with maturity N is Vn, where V is the valueprocess of a self-nancing portfolio with nal value VN = H. In particular,the fair price of the claim is V0.

In Chapter 7, we illustrate Theorem 5.3.3 for the special case of a security Sthat follows the binomial model.


5.4 Exercises

1. A hat contains three slips of paper numbered 1, 2, and 3. The slips arerandomly drawn from the hat one at a time without replacement. LetXn denote the number on the nth slip drawn. Describe the sample spaceΩ of the experiment and the natural ltration (Fn)3

n=1 associated with(Xn)3

n=1.

2. Rework Exercise 1 if the second slip is replaced.

3. Complete the proof of Theorem 5.2.5 by showing that (v) implies (i).

4. Show that for n = 0, 1, . . . , N − 1,

θn+1 =∆Vn

∆Sn, and φn+1 =

∆Vn∆Sn −∆Vn∆Sni(1 + i)−1Sn+1 − iSn

.

These equations explicitly express the trading strategy in terms of thestock price and value processes.

5. The gain of a portfolio (φ, θ) in the time interval (n− 1, n] is dened as

(φnBn + θnSn)− (φnBn−1 + θnSn−1)

= φn∆Bn−1 + θn∆Sn−1, n = 1, 2, . . . , N.

The gain up to time n is the sum Gn of the gains over the time intervals(j − 1, j], 1 ≤ j ≤ n:

G0 = 0 and Gn =

n∑j=1

(φj∆Bj−1 + θj∆Sj−1) , n = 1, 2, . . . N.

(Gn)Nn=0 is called the gains process of the portfolio. Show that the port-folio is self-nancing i for each n the time-n value of the portfolio is itsinitial value plus the gain Gn, that is,

Vn = V0 +Gn, n = 1, 2, . . . , N.

Chapter 6

Expectation of a Random Variable

Expectation is a probabilistic interpretation and generalization of the notionof weighted average. For example, suppose that we repeatedly toss a pair ofdistinguishable fair coins. If X denotes the total number of heads that comeup on each toss, then, in the long run, X takes on the values 0, 1, and 2 withrelative frequencies .25, .5, and .25, respectively. The average value of X, thatis, the average number of heads in the long run, is therefore (.25)0 + (.5)1 +(.25)2 = 1. This idea may be made precise for a general random variable X.For our purposes, however, it is sucient to consider two special cases: discreteand continuous random variables.

6.1 Discrete Case: Denition and Examples

Denition 6.1.1. The expectation (expected value, mean) of a discrete ran-dom variable X on a probability space (Ω,F ,P) is dened as

EX =∑x∈R

xpX

(x),

provided the sum on the right exists.

Since X is discrete, the expression on the right (ignoring zero terms) is eithera nite sum or an innite series. If the series diverges, then EX is undened.For the remainder of the chapter, we shall tacitly assume that all expectationsin any given discussion exist.

Example 6.1.2. The table below gives the course grade distribution of aclass of 100 students. Let X be the grade of a student chosen at random,

no. of students 15 25 40 12 8grade A B C D F

where X = 4 if the student received an A, X = 3 for a B, and so forth. From

67


Denition 6.1.1, the class average is seen to be

EX = 4(.15) + 3(.25) + 2(.4) + 1(.12) + 0(.8) = 2.27.

Example 6.1.3. Let A ∈ F andX = IA. Since pX (1) = P(A), pX

(0) = P(A′),and p

X(x) = 0 for all other values of x, we see that

E IA = P(A).

Example 6.1.4. If X ∼ B(n, p), then, recalling that pX

(k) =(nk

)pkqn−k

(q := 1− p), we have

EX =

n∑k=1

k

(n

k

)pkqn−k = np

n−1∑k=0

(n− 1

k

)pkqn−1−k = np(p+q)n−1 = np.

Example 6.1.5. Let X be a geometric random variable with parameter p.Since p

X(n) = pqn−1,

EX = p

∞∑n=1

nqn−1 = p

∞∑n=1

d

dqqn = p

d

dq

∞∑n=1

qn = pd

dq

(q

1− q

)=

1

p.

Remark 6.1.6. In a discrete probability space, the term xpX

(x) may beexpanded as

x∑

ω:X(ω)=x

P(ω) =∑

ω:X(ω)=x

X(ω)P(ω).

Summing over x and noting that the pairwise disjoint sets X = x parti-tion Ω, we obtain the following useful characterization of expected value in adiscrete space:

EX =∑ω∈Ω

X(ω)P(ω).

6.2 Continuous Case: Denition and Examples

Denition 6.2.1. The expectation (expected value, mean) of a continuousrandom variable X on a probability space is dened as

EX =

∫ ∞−∞

xfX

(x) dx.

If the integral diverges, then EX is undened. As in the discrete case, we shalltacitly assume in what follows that all stated expectations exist.

Expectation of a Random Variable 69

Example 6.2.2. Let X be uniformly distributed on the interval (α, β). Then

EX = (β − α)−1

∫ β

α

x dx =α+ β

2.

In particular, the average value of a number selected randomly from the in-terval (0, 1) is 1/2.

Example 6.2.3. Let X ∼ N(0, 1). Then

EX =1√2π

∫ ∞−∞

xe−12x

2

dx = 0,

since the integrand is an odd function. More generally, if X ∼ N(µ, σ2), then

EX =1

σ√

2π

∫ ∞−∞

xe−12 ( x−µσ )

2

dx

=1√2π

∫ ∞−∞

(σy + µ)e−12y

2

dy

=σ√2π

∫ ∞−∞

ye−12y

2

dy +µ√2π

∫ ∞−∞

e−12y

2

dy

= µ.

6.3 Properties of Expectation

The following theorem is useful for computing the expectation of morecomplex random variables.

Theorem 6.3.1 (Law of the Unconscious Statistician I). (i) If X is a discreterandom variable and h(x) is any function, then

Eh(X) =∑x

h(x)pX

(x),

where the sum is taken over all x in the range of X.(ii) If X is a continuous random variable and h(x) is a continuous function,

then

Eh(X) =

∫ ∞−∞

h(x)fX

(x) dx.

Proof. We prove only (i); for a proof for (ii), see, for example, [5]. For y in therange of h(X), we have

P(h(X) = y) =∑x

P(X = x, h(x) = y) =∑

x:h(x)=y

pX

(x)


hence

Eh(X) =∑y

y∑

x:h(x)=y

pX

(x) =∑y

∑x:h(x)=y

h(x)pX

(x) =∑x

h(x)pX

(x).

Example 6.3.2. Let X ∼ N(0, 1) and let n be a nonnegative integer. ByTheorem 6.3.1,

EXn =1√2π

∫ ∞−∞

xne−x2/2 dx.

If n is odd, then the integrand is an odd function hence EXn = 0. If n iseven,

EXn =2√2π

∫ ∞0

xne−x2/2 dx,

and an integration by parts yields

EXn = (n− 1)2√2π

∫ ∞0

xn−2e−x2/2 dx = (n− 1)EXn−2.

Iterating we see that for any even positive integer n,

EXn = (n− 1)(n− 3) · · · 3 · 1.

In particular, EX2 = 1. EXn is called the nth moment of X.

Theorem 6.3.1 extends to the case of functions of more than one variable.We state a version for two variables.

Theorem 6.3.3 (Law of the Unconscious Statistician II). (i) If X and Y arediscrete random variables and h(x, y) is any function, then

Eh(X,Y ) =∑x,y

h(x, y)pX,Y

(x, y),

where the sum is taken over all x in the range of X and y in the range of Y .(ii) If X and Y are jointly continuous random variables and h(x, y) is a

continuous function, then

Eh(X,Y ) =

∫ ∞−∞

∫ ∞−∞

h(x, y)fX,Y

(x, y) dx dy.

Theorem 6.3.4. If X and Y are discrete or jointly continuous random vari-ables and α, β ∈ R, then

(i) (unit property) E 1 = 1;

(ii) (linearity) E(αX + βY ) = αEX + βEY ;


(iii) (order property) X ≤ Y ⇒ EX ≤ EY ; and

(iv) (absolute value property) |EX| ≤ E |X|.Proof. Part (i) is clear. We prove (ii) only for the discrete case. By Theo-rem 6.3.3,

E(αX + βY ) =∑x,y

(αx+ βy)pX,Y

(x, y)

= α∑x

x∑y

pX,Y

(x, y) + β∑y

y∑x

pX,Y

(x, y)

= α∑x

xpX

(x) + β∑y

ypY

(y)

= αEX + βEY.

For the continuous version of (iii), set Z = Y −X. Then, by Example 3.4.3,Z is a continuous, nonnegative random variable hence, for z < 0,∫ z

−∞fZ

(t) dt = P(Z ≤ z) = 0.

Dierentiating with respect to z, we see that fZ

(z) = 0 for z < 0. Therefore,

EY − EX = EZ =

∫ ∞0

zfZ

(z) dz ≥ 0.

Part (iv) follows from (iii) and the inequality ±EX = E(±X) ≤ E |X|.Theorem 6.3.5. Let X and Y be discrete or jointly continuous independentrandom variables. Then E(XY ) = (EX)(EY ).

Proof. We prove only the jointly continuous case. By Corollary 3.5.4,fX,Y

(x, y) = fX

(x)fY

(y) hence, by Theorem 6.3.3,

E(XY ) =

∫ ∞−∞

∫ ∞−∞

xyfX

(x)fY

(y) dx dy

=

∫ ∞−∞

xfX

(x) dx

∫ ∞−∞

yfY

(y) dy

= (EX)(EY ).

6.4 Variance of a Random Variable

Denition 6.4.1. Let X be a discrete or continuous random variable withmean µ := EX. The variance and standard deviation of X are dened, re-


spectively, asVX = E(X − µ)2 and σ(X) =

√VX.

Variance is a convenient measure of how much on average a random vari-able deviates from its mean. By linearity of expectation, we have the alternatecharacterization

VX = EX2 − 2µEX + µ2 = EX2 − µ2 = EX2 − E2X,

where we have used the shorthand notation E2X for (EX)2.

Theorem 6.4.2. (i) For real numbers α and β, V(αX + β) = α2VX.

(ii) If X and Y are independent, then V(X + Y ) = VX + VY .

Proof. By linearity,

E(αX + β)2 = α2EX2 + 2αβµ+ β2

andE2(αX + β) = (αµ+ β)

2= α2µ2 + 2αβµ+ β2,

where µ = EX. Subtracting these equations proves part (i).If X and Y are independent, then, by Theorem 6.3.5,

E(X + Y )2 = E(X2 + 2XY + Y 2) = EX2 + 2(EX)(EY ) + EY 2.

Also,E2(X + Y ) = (EX + EY )

2= E2X + 2(EX)(EY ) + E2Y.

Subtracting these equations yields (ii).

Example 6.4.3. If X is Bernoulli random variable with parameter p, thenEX = p = EX2 hence VX = p(1− p).Example 6.4.4. The variance of a binomial random variable Y ∼ B(n, p)may be calculated directly from the denition but it is easier to use the factthat Y has the same distribution as a sum X1 +X2 + · · ·+Xn of independentBernoulli random variables Xj with parameter p. Then, by Theorem 6.4.2 andExample 6.4.3,

VY = VX1 + VX2 + · · ·+ VXn = np(1− p).

Example 6.4.5. Let X1, X2, . . . be a sequence of iid random variables suchthat Xj takes on the values 1 and −1 with probabilities p and 1− p, respec-tively, and set Yn = X1 + X2 + · · · + Xn. Then EYn = n(2p − 1). Moreover,Zj := (Xj + 1)/2 is Bernoulli and Yn = 2

∑nj=1 Zj − n, so by Example 6.4.4

and Theorem 6.4.2,VYn = 4np(1− p).

The random variable Yn may be interpreted as the position of a particlemoving one step to the right with probability p and one step to the left withprobability 1−p. The stochastic process (Yn)n≥1 is called a random walk.


Example 6.4.6. If X ∼ N(µ, σ2), then Y = (X − µ)/σ ∼ N(0, 1) hence, byExample 6.3.2, EY = 0 and EY 2 = 1. Therefore,

VX = V(σY + µ) = σ2VY = σ2.

6.5 The Central Limit Theorem

The Central Limit Theorem (CLT) is one of the most important resultsin probability theory. It conveys the remarkable fact that the distributionof a (suitably normalized) sum of a large number of iid random variables isapproximately that of a standard normal random variable. It explains whythe data from so many populations exhibits a bell-shaped curve. Proofs of theCLT may be found in standard texts on advanced probability.

Theorem 6.5.1 (Central Limit Theorem). Let X1, X2, . . . be a sequence ofiid random variables with mean µ and standard deviation σ, and let Yn =∑nj=1Xj. Then

limn→∞

P(Yn − nµσ√n≤ x

)= Φ(x).

Remarks 6.5.2. It follows from the CLT that

limn→∞

P(a ≤ Yn − nµ

σ√n≤ b)

= Φ(b)− Φ(a). (6.1)

In the special case that the random variables Xj are Bernoulli with parameterp ∈ (0, 1), Yn ∼ B(n, p) and (6.1) becomes

limn→∞

P

(a ≤ Yn − np√

np(1− p)≤ b)

= Φ(b)− Φ(a). (6.2)

(see Example 6.4.3). This result is known as the DeMoivre-Laplace Theorem.One can use (6.2) to obtain the following approximation for the pmf of the

binomial random variable Yn: For k = 0, 1, . . . , n,

P(Yn = k) = P(k − .5 < Yn < k + .5)

= P

(k − .5− np√

npq<Yn − np√

npq<k + .5− np√

npq

)≈ Φ

(k + .5− np√

npq

)− Φ

(k − .5− np√

npq

).

(In the rst equality, we made a correction to compensate for using a contin-uous distribution to approximate a discrete one.) In particular, for p = .5 we


have the approximation

P(Yn = k) ≈ Φ

(2k + 1− n√

n

)− Φ

(2k − 1− n√

n

). (6.3)

Example 6.5.3. Suppose we ip a fair coin 50 times. By (6.3), the probabilitythat the coin comes up heads 25 times (rounded to four decimal places) is

Φ

(1√50

)− Φ

( −1√50

)= .1125.

The actual probability (rounded to four decimal places) is

P(X = 25) =

(50

25

)(1

2

)50

= .1123


6.6 Exercises

1. A jar contains r red and w white marbles. The marbles are drawn oneat a time and replaced. Let Y denote the number of red marbles drawnbefore the second white one. Find EY in terms of r and w. (Use Exam-ples 3.5.8 and 6.1.5.)

2. Pockets of a roulette wheel are numbered 1 to 36, of which 18 are redand 18 black. There are also green pockets numbered 0 and 00. If a $1bet is placed on black, the gambler wins $2 (hence has a prot of $1)if the ball lands on black, and loses $1 otherwise (similarly for red).Suppose a gambler employs the following betting strategy: She initiallybets $1 on black. If black appears, she takes her prot and quits. If sheloses, she bets $2 on black, quitting and taking her prot of $1 if shewins, otherwise betting $4 on the next spin. She continues in this way,quitting if she wins a game, doubling the bet on black otherwise. Shedecides to quit after the Nth game, win or lose. Show that her expectedprot is 1− (2p)N , where p = 20/38. What is the expected prot for aroulette wheel with no green pockets? (The general betting strategy ofdoubling a wager after a loss is called a martingale.)

3. Show that the variance of a geometric random variableX with parameterp is q/p2, where q = 1− p.

4. A Poisson random variable X with parameterλ > 0 has distribution

pX

(n) =λn

n!e−λ, n = 0, 1, 2, . . . .

Find the expectation and variance of X. (Poisson random variables areused for modeling the random ow of events such as motorists arrivingat toll booths or calls arriving at service centers.)

5. A jar contains r red and w white marbles. The marbles are drawn ran-domly one at a time until the drawing has produced two marbles of thesame color. Find the expected number of marbles drawn if, after eachdraw, the marble is (a) replaced; (b) discarded.

6. A hat contains a slips of paper labeled with the number 1 and b slipslabeled 2. Two slips are drawn at random without replacement. LetX be the number on the rst slip drawn and Y the number on thesecond. Show that (a) X and Y are identically distributed, and (b)E(XY ) 6= (EX)(EY ).

7. Let A1, A2, . . . , An be independent events and set X =∑nj=1 ajIAj ,

aj ∈ R. Show that VX =

n∑j=1

a2jP (Aj)P (A′j).


8. Let Y be a binomial random variable with parameters (n, p). Find E 2Y .

9. Let X and Y be independent and uniformly distributed on [0, 1]. Cal-

culate E(

4XY

X2 + Y 2 + 1

).

10. Find E|X| if X ∼ N(0, 1).

11. Let X and Y be independent random variables with EX = EY = 0.Show that E (X + Y )2 = EX2 + EY 2 and E (X + Y )3 = EX3 + EY 3.What can you say about higher powers?

12. Let X and Y be independent jointly continuous random variables, eachwith an even density function. Show that if n is odd, E (X + Y )n = 0.

13. Express the integrals (a)∫ baeαxϕ(x) dx and (b)

∫ baeαxΦ(x) dx in terms

of Φ.

14. A positive random variable X such that lnX ∼ N(µ, σ2) is said to be

lognormal with parameters µ and σ2. Show that EX = eµ+σ2/2 andEX2 = e2(µ+σ2).

15. Find VX if X is uniformly distributed on the interval (α, β).

16. Let X and Y be independent random variables with X uniformly dis-tributed on [0, 1] and Y ∼ N(0, 1). Find the mean and variance of Y eXY .

17. Show that E2X ≤ EX2.

18. Find the nth moment of a random variable X with density fX

(x) =12e−|x|.

19. Show that, in the notation of Example 3.2.5, the expectation of a hy-pergeometric random variable with parameters (p, z,N) is pz.

20. Find the expected value of the random variable Z in Exercise 3.9.

21. Use the Central Limit Theorem to estimate the probability that thenumber Y of heads appearing in 100 tosses of a fair coin is (a) exactly50; (b) lies between 40 and 60. (A spreadsheet with a built in normalcdf is useful here.) Find the exact probability for part (a).

22. A true-false exam has 54 questions. Use the CLT to approximate theprobability of getting a passing score of 35 or more correct answerssimply by guessing on each question.

Chapter 7

The Binomial Model

To nd an explicit expression for the value of an option one needs a concretemathematical model for the value of the underlying asset. In this chapter,we construct the geometric binomial model for stock price movement anduse it to determine the value of a general European claim. An importantconsequence is the Cox-Ross-Rubinstein formula for the price of a call option.The valuation analysis in this chapter is based on the notion of self-nancingportfolio described in Chapter 5.

7.1 Construction of the Binomial Model

Consider a (non-dividend-paying) stock S with current price S0 such thatthe price changes each time period by a factor u with probability p or by afactor d with probability q := 1 − p, where 0 < d < u. The symbols u and dare meant to suggest the words up and down, and we shall use these termsto describe the price movement from one period to the next, even though umay be less than 1 (the prices drift downward) or d greater than 1 (the pricesdrift upward).

We model the stock's random behavior as follows: Let Ω be the set ofall sequences ω = (ω1, ω2, . . . , ωN ), where ωn = u if the stock moves upduring the nth time period and ωn = d if the stock moves down. Thus, Ω =Ω1 ×Ω2 × · · · ×ΩN , where Ωn = u, d represents the possible movements attime n. Dene a probability measure Pn on Ωn by

Pn(ωn) =

p if ωn = u, and

q if ωn = d.

Using the measures Pn we dene a probability measure P on subsets A of Ωby

P(A) =∑ω∈A

P1(ω1)P2(ω2) · · ·PN (ωN ),

where ω = (ω1, ω2, . . . , ωN ). For example, the probability that the stock risesthe rst period and falls the next is

P1(u)P2(d)P3(Ω3) · · ·PN (ΩN ) = pq.

77


More generally, if An ⊆ Ωn, n = 1, 2, . . . , N , then

P(A1 ×A2 × · · · ×AN ) = P1(A1)P2(A2) · · ·PN (AN ). (7.1)

The probability measure P therefore models the independence of the stockmovements. P is called the product of the measures Pn. As usual, E denotesthe corresponding expectation operator.

The price of the stock at time n is a random variable Sn on Ω such that

Sn(ω) =

uSn−1(ω) if ωn = u,

dSn−1(ω) if ωn = d.

Iterating, we see that

Sn(ω) = ωnSn−1(ω) = ωnωn−1Sn−2(ω) = · · · = ωnωn−1 · · ·ω1S0. (7.2)

Now let Xj = 1 if the stock goes up in the jth time period and Xj = 0 ifthe stock goes down. The random variable Yn := X1 +X2 + · · ·+Xn countsthe number of upticks of the stock in the time period from 0 to n, hence from(7.2)

Sn = uYndn−YnS0 =(ud

)YndnS0. (7.3)

Under the probability law P, the Xj 's are independent Bernoulli random vari-ables on Ω with parameter p, hence Yn is a binomial random variable withparameters (n, p). The stochastic process S := (Sn)Nn=0 is called a geometricbinomial price process.

uS0

dS0

u2S0

udS0

udS0

d2S0

u3S0

u2dS0

u2dS0

u2dS0

ud2S0

d3S0

ud2S0

ud2S0

p

q

p

q

q

p

q

p

q

p

p

q

p

q

S0

FIGURE 7.1: 3-Step Binomial Tree

The Binomial Model 79

Figure 7.1 shows the possible stock movements for three time periods. Theprobabilities along the edges are conditional probabilities; specically,

P(Sn = ux|Sn−1 = x) = p, and P(Sn = dx|Sn−1 = x) = q.

Other conditional probabilities may be found by multiplying probabilitiesalong edges and adding. For example,

P(Sn = udx|Sn−2 = x) = 2pq,

since there are two paths leading from vertex x to vertex udx.

Example 7.1.1. The probability that the price of the stock at time n is largerthan its initial price is found from (7.3) by observing that

Sn > S0 ⇐⇒(ud

)Yndn > 1 ⇐⇒ Yn > a :=

n ln d

ln d− lnu.

If d > 1, a is negative and P(Sn > S0) = 1. If d < 1,

P(Sn > S0) =

n∑j=m+1

P(Yn = j) =

n∑j=m+1

(n

j

)pjqn−j ,

where m := bac is the greatest integer in a. For example, if n = 100, p = .5,u = 1.2, and d = .8, then P(Sn > S0) ≈ .14. If u is increased to 1.25 or if p isincreased to .55, the probability goes up to .46. Similarly, if d is decreased to.75 or p is decreased to .44, the probability goes down to .01.

We model the ow of stock price information by the natural ltration(FSn )Nn=0. Note that because S0 is constant, FS0 = ∅,Ω. It is easy to see thatfor n ≥ 1 the σ-eld FSn consists of Ω, ∅, and all unions of sets of the form

Aη := η × Ωn+1 × · · ·ΩN ,where η = (η1, η2, . . . , ηn) represents a particular market scenario up throughtime n. For example, if N = 4, FS2 is generated by the sets η × Ω3 × Ω4,where η = (u, u), (u, d), (d, u), or (d, d).

The following notational convention will be convenient: If Z is a FSN -random variable on Ω that depends only on the rst n coordinates of ω, we willsuppress the lastN−n coordinates in the notation Z(ω1, ω2, . . . , ωN ) and writeinstead Z(ω1, ω2, . . . , ωn). Such a random variable is FSn -measurable since theevent Z = z is of the form A × Ωn+1 × · · · × ΩN and hence is a union ofthe sets Aη, where η ∈ A. Moreover, since Pn+1(Ωn+1) = · · · = PN (ΩN ) = 1,Remark 6.1.6 implies the following truncated form of the expectation of Z:

EZ =∑ω∈Ω

Z(ω)P(ω)

=∑

(ω1,...,ωn)

Z(ω1, . . . , ωn)P1(ω1) · · ·Pn(ωn)Pn+1(Ωn+1) · · ·PN (ΩN )

=∑

(ω1,...,ωn)

Z(ω1, . . . , ωn)P1(ω1) · · ·Pn(ωn). (7.4)


7.2 Pricing a Claim in the Binomial Model

In this section, we determine the fair price of a European claim H whenthe underlying stock S has a geometric binomial price process (Sn)Nn=0, asdescribed in the preceding section. Let B be a risk-free bond with price processBn = (1+ i)n. According to Theorem 5.3.3, if the binomial model is arbitrage-free, then the proper value of the claim at time n is that of a self-nancing,replicating portfolio (φ, θ) based on (B,S) with nal value H. For the timebeing, however, we do not make the assumption that the model is arbitrage-free.

To construct a self-nancing, replicating portfolio, we start by deningVN = H and then work backward, using the characterization of self-nancingportfolio given in part (iii) of Theorem 5.2.5, namely,

Vn+1 = θn+1Sn+1 + (1 + i)[Vn − θn+1Sn], n = 0, 1, . . . , N − 1. (7.5)

For a given ω = (ω1, ω2, . . . , ωn), Equation (7.5) evaluated at ω may be writtenas a system

θn+1(ω)Sn+1(ω, u) + (1 + i)[Vn(ω)− θn+1(ω)Sn(ω)] = Vn+1(ω, u)

θn+1(ω)Sn+1(ω, d) + (1 + i)[Vn(ω)− θn+1(ω)Sn(ω)] = Vn+1(ω, d).

(Recall our convention of displaying only the time-relevant coordinates ofmembers of Ω.) The idea is to solve the system for Vn(ω) in terms of Vn+1(ω, u)and Vn+1(ω, d). With VN already dened as H, backward induction may beused to construct a process V = (Vn) and from it a process (θn)Nn=1 whichsatises (7.5) and hence generates a self-nancing, replicating portfolio for H.

We begin by solving the above system for θn+1(ω). Subtracting the equa-tions, using Sn+1(ω, ωn+1) = ωn+1Sn(ω), we have

θn+1(ω) =Vn+1(ω, u)− Vn+1(ω, d)

(u− d)Sn(ω), n = 0, 1, . . . , N − 1. (7.6)

Equation (7.6) is referred to as the delta hedging rule. Solving the rst equationin the above system for (1+i)Vn(ω) and using the delta hedging rule, we obtain

(1 + i)Vn(ω) = Vn+1(ω, u) + θn+1(ω)Sn(ω)(1 + i− u)

= Vn+1(ω, u) + [Vn+1(ω, u)− Vn+1(ω, d)]1 + i− uu− d

= Vn+1(ω, u)1 + i− du− d + Vn+1(ω, d)

u− 1− iu− d . (7.7)

Equation (7.7) expresses Vn uniquely in terms of Vn+1, completing the back-ward induction process.


By construction, V = (Vn)Nn=0 satises (7.5), where the process θ =(θn)Nn=1 is dened by (7.6). Now dene a process φ = (φn)Nn=1 by

φn+1(ω) = (1 + i)−n[Vn(ω)− θn(ω)Sn(ω)], n = 0, 1, . . . , N − 1. (7.8)

Since φn and θn depend only on the rst n− 1 time steps, the process (φ, θ)is predictable and hence is a trading strategy. Note that Equations (7.6) and(7.8) dene θ1 and φ1 as constants, in accordance with the portfolio theory ofChapter 5. From (7.5) with n replaced by n− 1, we have

Vn = θnSn + (1 + i)[Vn−1 − θnSn−1] = θnSn + φnBn, n = 1, 2 . . . , N,

which shows that V is the value process for the portfolio. We have constructeda unique self-nancing, replicating strategy for the claim H.

The above results are summarized in the following theorem:

Theorem 7.2.1. Given a European claim H in the binomial model thereexists a unique self-nancing trading strategy (φ, θ) with value process V sat-isfying VN = H. Furthermore, V is given by the backward recursion scheme

Vn(ω) = (1 + i)−1 [Vn+1(ω, u)p∗ + Vn+1(ω, d)q∗] , (7.9)

ω = (ω1, ω2, . . . , ωn), n = N − 1, N − 2, . . . , 0,

where

p∗ :=1 + i− du− d and q∗ :=

u− 1− iu− d . (7.10)

The strategy (φ, θ) is expressed in terms of V by (7.6) and (7.8).

It is easy to verify that (p∗, q∗) is a probability vector i u and d satisfythe inequalities 0 < d < 1 + i < u. In this case, we may construct a prob-ability measure P∗ on Ω in exactly the same manner as P was constructed,but with p and q replaced, respectively, by p∗ and q∗. Denoting the corre-sponding expectation operator by E∗, we have the following consequence ofTheorem 7.2.1:

Corollary 7.2.2. If 0 < d < 1 + i < u, then (p∗, q∗) is a probability vectorand the discounted value process V = (Vn)Nn=0 satises

E∗Vn = E∗Vn+1, n = 0, 1, . . . , N − 1.

In particular, V0 = (1 + i)−NE∗H.


Proof. Using (7.4) with P replaced by P∗ and (7.9) we have

(1 + i)E∗Vn = (1 + i)∑

ω=(ω1,...,ωn)

Vn(ω)P∗1(ω1) · · ·P∗n(ωn)

=∑

ω=(ω1,...,ωn)

[Vn+1(ω, u)p∗ + Vn+1(ω, d)q∗]P∗1(ω1) · · ·P∗n(ωn)

=∑

ω=(ω1,...,ωn+1)

Vn+1(ω)P∗1(ω1) · · ·P∗n+1(ωn+1)

= E∗Vn+1.

Dividing by (1 + i)n+1 completes the proof.

Remarks 7.2.3. (a) For d < 1 + i < u, the random variables Xj denedin Section 7.1 are still independent Bernoulli random variables with respectto the new probability measure P∗, hence Yn = X1 + X2 + · · · + Xn is abinomial random variable with parameters (n, p∗). The probabilities p∗ andq∗ are called the risk-neutral probabilities and P∗ is the risk-neutral probabilitymeasure. Note that, in contrast to the law P, which reects the perception ofthe market, P∗ is a purely mathematical construct.

(b) Using the identity up∗+dq∗ = 1+i, we see from (7.3) and independencethat

E∗Sn = S0dnE∗

(ud

)Yn= S0d

nE∗n(ud

)X1

= S0dn(udp∗ + q∗

)n= (1+ i)nS0.

Thus, E∗Sn is the time-n value of a risk-free account earning compound in-terest at the rate i. A similar calculation shows that

ESn = (up+ dq)nS0 = (1 + j)nS0,

where j := up + dq − 1. Since there is risk involved in buying a stock, onewould expect that j > i, which is equivalent to p > p∗.

Corollary 7.2.4. The binomial model is complete. Moreover, it is arbitrage-free i the inequality d < 1 + i < u holds. In this case, the proper price of aclaim H is (1 + i)−NE∗H.

Proof. Theorem 7.2.1 shows that the model is complete, and we have alreadyseen that the inequalities d < 1 + i < u follow from the assumption that themarket is arbitrage-free (Example 4.1.2).

Assume that the inequalities hold. To show that the binomial model isarbitrage-free, suppose there exists a self-nancing trading strategy with valueprocess U = (Un) such that U0 = 0, Un ≥ 0 for all n, and P(UN > 0) > 0. ThenUN > 0 6= ∅, and since P∗(ω) > 0 for all ω it follows that E∗UN > 0. Onthe other hand, if we take H = UN in Theorem 7.2.1 then, by uniqueness, Umust be the process V constructed in that theorem hence, by Corollary 7.2.2,E∗UN = (1 + i)NU0 = 0. This contradiction shows that the binomial modelmust be arbitrage-free. The last assertion of the corollary follows from Theo-rem 5.3.3.


Since SN = uYNdN−YNS0 and YN ∼ B(N, p∗), Corollary 7.2.4 and thelaw of the unconscious statistician imply the following general formula for theprice of a claim.

Corollary 7.2.5. If d < 1 + i < u and the claim H is of the form f(SN ) forsome function f(x), then the proper price of the claim is

V0 = (1 + i)−NE∗ f(SN ) = (1 + i)−NN∑j=0

(N

j

)f(S0u

jdN−j)p∗jq∗N−j .

Example 7.2.6. For a forward we take f(x) = x−K in Corollary 7.2.5. Since

E∗ f(SN ) = S0

N∑j=0

(N

j

)(ujdN−jS0 −K)p∗jq∗N−j

= S0

N∑j=0

(N

j

)(up∗)j(dq∗)N−j −K

N∑j=0

(N

j

)p∗jq∗N−j

= S0(up∗ + dq∗)N −K(p∗ + q∗)N

= S0(1 + i)N −K,

we see that V0 = S0 − K(1 + i)−N . Recalling that there is no cost to enterthe contract we have K = S0(1 + i)N . This is the discrete-time analog ofEquation (4.3), which was obtained by a general arbitrage argument.

7.3 The Cox-Ross-Rubinstein Formula

To apply the results of Section 7.2 to call options, dene a function Ψ by

Ψ(m,N, p) =

N∑j=m

(N

j

)pj(1− p)N−j , 0 < p < 1, m = 0, 1, 2, . . . N.

Note that Ψ(m,N, p) = 1−Θ(m−1, N, p), where Θ( · , N, p) is the cdf of a bi-nomial random variable with parameters (N, p). The following result expressesthe cost of a call option in terms of Ψ.

Theorem 7.3.1 (Cox-Ross-Rubinstein (CRR) Formula). If d < 1 + i < u,then the cost C0 of a call option with strike price K to be exercised after Ntime steps is

C0 = S0Ψ(m,N, p)− (1 + i)−NKΨ(m,N, p∗), (7.11)


where

p :=

(p∗u

1 + i

), q := 1− p =

(q∗d

1 + i

), (7.12)

and m is the smallest integer ≥ 0 for which S0umdN−m > K. Specically

m := (bac+ 1)+, a :=ln (K)− ln (dNS0)

lnu− ln d.

If m > N (which occurs i K ≥ uNS0), the right side of (7.11) is interpretedas zero.

Proof. By Corollary 7.2.5 applied to the function f(x) = (x−K)+,

C0 = (1 + i)−NN∑j=0

(N

j

)(S0u

jdN−j −K)+p∗jq∗N−j . (7.13)

If S0uN ≤ K, then S0u

jdN−j−K ≤ S0ujuN−j−K ≤ 0 for all j hence C0 = 0.

Accordingly, the right side of (7.11) is interpreted as zero.Now assume that S0u

N > K. Then there must be a smallest integer min the set 0, 1, . . . , N for which S0u

mdN−m > K. Moreover, since u/d > 1,S0u

jdN−j is increasing in j hence S0ujdN−j > K for j ≥ m. It follows that

C0 = (1 + i)−NN∑j=m

(N

j

)(S0u

jdN−j −K)p∗jq∗N−j

= S0

N∑j=m

(N

j

)(p∗u

1 + i

)j (q∗d

1 + i

)N−j− (1 + i)−NK

N∑j=m

(N

j

)p∗jq∗N−j

= S0

N∑j=m

(N

j

)pj qN−j − (1 + i)−NK

N∑j=m

(N

j

)p∗jq∗N−j .

The inequality u > 1+ i implies that p < 1. Since p+ q = 1, (7.11) follows.

Example 7.3.2. Table 7.1 gives the price C0 of a call, as calculated by theCRR formula, and the price P0 = C0−S0+(1+i)−NK of the corresponding put(put-call parity formula) for various values of K, u, and d, where S0 = $20.00and the nominal rate is taken to be r = .10. We consider daily uctuationsof the stock so i = .10/365 ≈ .00027. The options are assumed to expirein 90 days. The table shows that C0 typically increases as the spread u − d(volatility of the stock price) increases. The table also shows that C0 decreasesas K increases. This is clear from (7.13) and is to be expected, as a larger Kresults in a smaller payo for the holder, making the option less attractive.


K u d C0 P0 K u d C0 P0

$18.00 1.1 .9 $8.15 $5.71 $22.00 1.1 .9 $6.87 $8.34$18.00 1.5 .9 $13.89 $11.45 $22.00 1.5 .9 $13.25 $14.71$18.00 1.1 .4 $16.99 $14.55 $22.00 1.1 .4 $16.63 $18.10$18.00 1.5 .4 $19.92 $17.49 $22.00 1.5 .4 $19.92 $21.38

TABLE 7.1: Variation of C0 and P0 with K, u, and d.


7.4 Exercises

1. Prove Equation 7.1.

2. Let η ∈ Ω1 × Ω2 × . . .× Ωn. Show that

P(η × Ωn+1 × Ωn+2 × . . .× ΩN ) = pYn(η)qn−Yn(η).

3. For n = 1, 2, . . . , N , dene a random variable Zn in the binomial modelby Zn(ω) = ωn. Show that

(a) Zn = (u− d)Xn + d;

(b) Sn = ZnZn−1 · · ·Z1S0;

(c) Xn = (Sn − dSn−1)/(u− d)Sn−1.

Conclude from (c) that FSn = FXn .

4. Find the probability (in terms of n and p) that the price of the stockin the binomial model goes down at least twice during the rst n timeperiods.

For the remaining exercises, assume that 0 < d < 1 + i < u.

5. In the one-step binomial model, the Cox-Ross-Rubinstein formula re-duces to

C0 = (1 + i)−1[(S0u−K)+p∗ + (S0d−K)+q∗

].

(a) Show that if S0d < K < S0u then

∂C0

∂u> 0 and

∂C0

∂d< 0.

Conclude that for u > K/S0 and d < K/S0, C0 increases as thespread u− d increases.

(b) Show that C0, as a function of (u, d), is constant for the range ofvalues u > (1 + i) > d ≥ K/S0.

6. Suppose in Example 7.1.1 that d = 1/u and p = .5. Show that

P(Sn > S0) = P(Sn < S0) =

1/2 if n is odd

(1/2)[1−

(nn/2

)2−n

]if n is even.

7. Let S0 = $50.00, r = .12, u = 1.1, and d = .9, and let the length of atime interval be one day. Find the prices of call and put options thatexpire in 90 days with strike price (a) K = $54.00, (b) K = $47.00. (Usea spreadsheet with a built-in binomial cdf.)


8. Find the probability that, in the binomial model, a call option nishesin the money.

9. A k-run of upticks is a sequence of k consecutive stock price increasesnot contained in any larger such sequence. Show that if N/2 ≤ k < Nthen the probability of a k-run of upticks in N time periods is

pk[2q + (N − k − 1)q2].

10. (Cash-or-nothing call option). Let A be a xed amount of cash. Showthat the cost V0 of a claim with payo AI(K,∞)(SN ) is

(1 + i)−NAΨ(m,N, p∗),

where m is dened as in Theorem 7.3.1.

11. (Asset-or-nothing call option). Show that the cost V0 of a claim withpayo SN I(K,∞)(SN ) is S0Ψ(m,N, p), where m and p are dened as inTheorem 7.3.1.

12. Show that a portfolio long in an asset-or-nothing call option (Exer-cise 11) and short in a cash-or-nothing call option with cash K (Ex-ercise 10) has the same time-n value as a portfolio with a long positionin a call option with strike price K.

13. Let 1 ≤M < N . Show that the cost V0 of a claim with payo (SN−SM )+

isS0Ψ(k, L, p)− (1 + i)−LΨ(k, L, p∗),

where

L := N −M, k := (bac+ 1)+, a :=L ln d

ln d− lnu,

and p is dened as in Theorem 7.3.1.

14. Show that the cost of a claim with payo SN (SN −K) is

V0 = S20

(v

1 + i

)N−KS0,

where v = (u+ d)(1 + i)− ud.

15. Show that the cost of a claim with payo SN (SN −K)+ is

V0 = S20

(v

1 + i

)NΨ(m,N, p)−KS0Ψ(m,N, p),

where m, and p are dened as in Theorem 7.3.1, v = (u+d)(1 + i)−ud,and p = p∗u2/v.


16. Use Exercises 14 and 15 and the law of one price to nd the cost V0 ofa claim with payo SN (K − SN )+.

17. Show that price of a claim with payo f(Sm, Sn), where 1 ≤ m < n ≤ N ,is

V0 =1

aN

m∑j=0

n−m+j∑k=j

(m

j

)(n−mk − j

)p∗kq∗n−kf(S0u

jdm−j , S0ukdn−k),

where a := (1 + i).

18. Referring to Example 7.1.1 with n = 100, p = .5, and d = .8, use aspreadsheet to nd the smallest value of u that results in P(Sn > S0) ≈.85.

19. Consider a claim with payo[

12 (S1 + SN )−K

]+. Use Exercise 17 to

show that if N is suciently large, specically,

uN−1 >2K − S0d

S0d,

then there exist nonnegative integers k1 and k2 less than N such thatthe price of the claim is

V0 =(S0d− 2K)q∗

2(1 + i)NΨ(k1, N − 1, p∗) +

S0dq∗

2Ψ(k1, N − 1, p)

+(S0u− 2K)p∗

2(1 + i)NΨ(k2, N − 1, p∗) +

S0up∗

2Ψ(k2, N − 1, p),

where p is dened as in Theorem 7.3.1. Show that k1 is the smallestnonnegative integer k such that S0d + S0u

kdN−k > 2K, and k2 is thesmallest nonnegative integer k such that S0u+ S0u

k+1dN−k−1 > 2K.

Chapter 8

Conditional Expectation and

Discrete-Time Martingales

Conditional expectation generalizes the notion of expectation by taking intoaccount the information provided by a given σ-eld. The most importantapplication of this notion is in the denition and construction of martingales.In this chapter, we develop the theory of conditional expectation and discrete-time martingales on a nite probability space. These ideas will be applied inthe next chapter to place the binomial model in a broader context, leading tothe formulation of more general option valuation models.

We assume throughout the chapter that Ω is a nite sample space, F isthe σ-eld of all subsets of Ω, and P is a probability measure with P(ω) > 0for all ω ∈ Ω. Note that any real function X on Ω is an F-random variable.As usual, the expectation of X with respect to P is denoted by EX.

8.1 Denition of Conditional Expectation

In Example 2.3.2, we observed that a partition P of Ω generates a σ-eldconsisting of ∅ and all nite unions of members of P. The following lemmaasserts that every σ-eld of subsets of Ω arises in this way.

Lemma 8.1.1. Let G be a σ-eld of subsets of Ω. Then G is generated by apartition of Ω.

Proof. Let B1, B2, . . . , Bm be the distinct members of G. For each m-tupleε = (ε1, ε2, . . . εm) with εj = ±1, dene Bε = Bε11 B

ε22 · · ·Bεmm , where

Bεjj =

Bj if εj = 1,

B′j if εj = −1.

The sets Bε are pairwise disjoint since two distinct ε's will dier in somecoordinate j, and the intersection of the corresponding Bε's will be containedin BjB

′j = ∅. Some of the sets Bε are empty, but every Bj is a union of those

Bε for which εj = 1. Denoting the nonempty Bε's by A1, A2, . . . , An, we seethat G is generated by the partition P = A1, A2, . . . , An.

89


Remark 8.1.2. Suppose (Fn)Nn=1 is a ltration of σ-elds on Ω such that Fnis generated by the partition An,1, An,2, . . . , An,mn. Since Fn ⊆ Fn+1, eachAn,j is a union of some of the sets An+1,1, An+1,2, . . ., An+1,mn+1 . Therefore,we can assign to each outcome ω ∈ Ω a unique sequence (j1, j2, . . . , jN ) withthe property that ω ∈ An,jn for each n. This provides a dynamic interpretationof an abstract experiment. (For a coin toss, the sequence is equivalent to asequence of heads and tails; see Example 5.1.4.) Figure 8.1 illustrates the ideafor the case N = 3.

A11 A12 A21

A22

A23

A24

A32

A33

A31 A34 A35

ω ω ω

FIGURE 8.1: ω described by the sequence (2, 4, 5)

Lemma 8.1.3. Let G be a σ-eld of subsets of Ω and let X and Y be G-random variables on Ω. If A1, A2, . . . , An is a generating partition for G,then

(i) X ≥ Y ⇐⇒ E(XIAj ) ≥ E(Y IAj ) for all j; and

(ii) X = Y ⇐⇒ E(XIAj ) = E(Y IAj ) for all j.

Proof. The necessity of (i) is clear and (ii) follows from (i). To prove thesuciency of (i), set Z = X − Y and assume that E(ZIAj ) ≥ 0 for all j. Weshow that the set A = Z < 0 is empty.

Suppose to the contrary that A 6= ∅. Since A ∈ G there exists a subsetJ ⊆ 1, 2, . . . , n such that A =

⋃j∈J Aj . Since the sets Aj are pairwise

disjoint,

E(ZIA) =∑j∈J

E(ZIAj ) ≥ 0.

On the other hand, by denition of A,

E(ZIA) =∑ω∈Ω

Z(ω)IA(ω)P(ω) =∑ω∈A

Z(ω)P(ω) < 0.

This contradiction shows that A must be empty and hence that X ≥ Y .

We are now in a position to prove the existence and uniqueness of condi-tional expectation.

Conditional Expectation and Discrete-Time Martingales 91

Theorem 8.1.4. If X is a random variable on (Ω,F ,P) and G is a σ-eldcontained in F , then there exists a unique G-random variable Y such that

E (IAY ) = E (IAX) for all A ∈ G. (8.1)

Proof. Let A1, A2, . . . , Am be a partition of Ω generating G (Lemma 8.1.1).Dene a G-random variable Y on Ω by

Y =

m∑j=1

ajIAj , aj :=E(IAjX)

P(Aj).

Since AjAk = ∅ for j 6= k,

IAkY =

m∑j=1

ajIAj IAk = akIAk

so thatE (IAkY ) = akP(Ak) = E(IAkX).

Since any member of G is a disjoint union of sets Ak, (8.1) holds. Uniquenessfollows from Lemma 8.1.3.

Denition 8.1.5. The G-random variable Y of Theorem 8.1.4 is called theconditional expectation of X given G and is denoted by E(X|G). In the specialcase that G = σ(X1, X2, . . . , Xn), E(X|G) is called the conditional expectationof X given X1, X2, . . . , Xn and is denoted by E(X|X1, X2, . . . , Xn).

Corollary 8.1.6. If X, X1, . . ., Xn are random variables on (Ω,F ,P), thenthere exists a function g

X(x1, x2, . . . , xn) such that

E(X|X1, X2, . . . , Xn) = gX

(X1, X2, . . . , Xn).

Proof. σ(X1, X2, . . . , Xn) is generated by the partition consisting of sets ofthe form

A(x) = X1 = x1, X2 = x2, . . . , Xn = xn, x := (x1, x2, . . . , xn). (8.2)

Dene

gX

(x) =

E(IA(x)X)

P(A(x))if A(x) 6= ∅,

0 otherwise.

From the proof of Theorem 8.1.4, we have

E(X|X1, X2, . . . , Xn) =∑x

gX

(x)IA(x).

If ω ∈ A(x), then x = (X1(ω), X2(ω), . . . , Xn(ω)) hence

E(X|X1, X2, . . . , Xn)(ω) = gX

(x)IA(x)(ω) = gX

(X1(ω), X2(ω), . . . Xn(ω)).

Since Ω is the union of the sets A(x), the equation holds for all ω ∈ Ω.


8.2 Examples of Conditional Expectation

The conditional expectation operator averages the values of a random vari-able X taking into account information provided by a σ-eld G. It may beviewed as the best prediction of X given G. The following examples illustratethis idea.

Example 8.2.1. Let G = ∅,Ω. Since, obviously, E(IAEX) = E(IAX) ifeither A = ∅ or A = Ω, E(X|G) = E(X). Thus, the best prediction of X giventhe information G, which is to say no information at all, is simply the expectedvalue of X.

Example 8.2.2. If G is the σ-eld consisting of all subsets of Ω, then, trivially,E(X|G) = X: the best prediction of X given all possible information is Xitself.

Example 8.2.3. Toss a coin N times and observe the outcome heads H ortails T on each toss. Let p be the probability of heads on a single toss andset q = 1− p. The sample space for the experiment is Ω = Ω1 × Ω2 · · · × ΩN ,where Ωn = H,T is the set of outcomes of the nth toss. The probabilitylaw for the experiment may be expressed as

P(ω) = pH(ω)qT (ω), ω = (ω1, ω2, . . . , ωN ),

where H(ω) denotes the number of heads in ω and T (ω) the number of tails.Fix n < N . For ω ∈ Ω we shall write

ω = (ω′, ω′′), ω′ ∈ Ω1 × Ω2 · · · × Ωn, ω′′ ∈ Ωn+1 × · · · × ΩN .

Let Gn denote the σ-eld generated by the sets

Aω′ = (ω′, ω′′) | ω′′ ∈ Ωn+1 × · · · × ΩN. (8.3)

Gn represents the information generated by the rst n tosses of the coin. Weclaim that for any random variable X,

E(X|Gn)(ω) = E(X|Gn)(ω′, ω′′) =∑η

pH(η)qT (η)X(ω′, η), (8.4)

where the sum on the right is taken over all η ∈ Ωn+1 × · · · × ΩN . Equation(8.4) asserts that the best prediction of X given the information providedby the rst n tosses (the known) is the average of X over the remainingoutcomes (the unknown).

To verify (8.4), denote the sum on the right by Y (ω) = Y (ω′, ω′′) andnote that, since Y depends only on the rst n tosses, Y is Gn-measurable. It


therefore suces to show that E(Y IA) = E(XIA) for the sets A = Aω′ denedin (8.3). Noting that

∑ω′′ p

H(ω′′)qT (ω′′) = 1 we have

E(IAY ) =∑ω∈A

Y (ω)P(ω)

=∑ω′′

Y (ω′, ω′′)pH(ω′)qT (ω′)pH(ω′′)qT (ω′′)

= pH(ω′)qT (ω′)∑ω′′

pH(ω′′)qT (ω′′)∑η

pH(η)qT (η)X(ω′, η)

= pH(ω′)qT (ω′)∑η

pH(η)qT (η)X(ω′, η)

=∑ω∈A

pH(ω)qT (ω)X(ω)

= E(IAX),

as required.

Example 8.2.4. Consider the geometric binomial price process S with itsnatural ltration (FSn ). We show that for 0 ≤ n < m ≤ N and any real-valuedfunction f(x),

E(f(Sm)|FSn

)=

k∑j=0

(k

j

)pjqk−jf

(ujdk−jSn

), k := m− n.

Let U denote the sum on the right. Since U is obviously FSn -measurable itsuces to show that E (f(Sm)IA) = E(UIA) for all A ∈ FSn . For this, we mayassume that A is of the form

A = η × Ωn+1 × · · · × ΩN ,

where η ∈ Ω1 × · · · × Ωn, since these sets generate FSn . Noting that

P(A) = P1(η1)P2(η2) · · ·Pn(ηn)

and ∑(ωm+1,...,ωN )

Pm+1(ωm+1)Pm+1(ωm+1) · · ·PN (ωN ) = 1

we have

E [f(Sm)IA] =∑ω∈A

f (Sm(ω))P(ω)

=∑ω∈A

f (ωn+1 · · ·ωmSn(ω))P(ω)

= P(A)∑

f (ωn+1 · · ·ωmSn(η))Pn+1(ωn+1) · · ·Pm(ωm),


where the sum in the last equality is taken over all sequences (ωn+1, . . . , ωm).Collecting together all terms in the last sum for which the sequence(ωn+1, . . . , ωm) has exactly j u's, j = 0, 1, . . . , k, we see that

E (f(Sm)IA) = P(A)

k∑j=0

(k

j

)pjqk−jf

(ujdk−jSn(η)

)= P(A)U(η).

Similarly, for each j ≤ k,

E(IAf(ujdk−jSn)

)=∑ω∈A

f(ujdk−jSn(ω))P(ω) = P(A)f(ujdk−jSn(η))

hence

E(IAU) = P(A)

k∑j=0

(k

j

)pjqk−jf

(ujdk−jSn(η)

)= P(A)U(η).

Therefore, E (f(Sm)IA) = E(IAU), as required.

8.3 Properties of Conditional Expectation

In the proofs of the following theorems, we rely on the fact that a G-randomvariable Y is the conditional expectation of X with respect to G i

E(Y IA) = E(XIA) for all A ∈ G.The rst theorem shows that conditional expectation has properties similar

to those of ordinary expectation.

Theorem 8.3.1. Let X and Y be random variables on (Ω,F ,P) and let G bea σ-eld contained in F . Then(i) (unit property) E(1|G) = 1;

(ii) (linearity) E(αX + βY |G) = αE(X|G) + βE(Y |G), α, β ∈ R;

(iii) (order property) X ≤ Y =⇒ E(X|G) ≤ E(Y |G); and

(iv) (absolute value property) |E(X|G)| ≤ E(|X||G).

Proof. We leave the proof of (i) to the reader. For (ii), let Z denote theG-random variable αE(X|G) + βE(Y |G) and let A ∈ G. By linearity of expec-tation,

E(IAZ) = αE [IAE(X|G)] + βE [IAE(Y |G)]

= αE(IAX) + βE(IAY )

= E [IA (αX + βY )] ,


verifying (ii). Property (iii) follows from Lemma 8.1.3 since for A ∈ G,

E [IAE(X|G)] = E(IAX) ≤ E(IAY ) = E [IAE(Y |G)] .

Part (iv) follows from ±E(X|G) = E(±X|G) ≤ E(|X|G).

The next theorem shows that known factors may be moved outside theconditional expectation operator.

Theorem 8.3.2 (Factor Property). Let X and Y be random variables on(Ω,F ,P) and let G be a σ-eld contained in F . If X is a G-random variable,then E(XY |G) = XE(Y |G). In particular, E(X|G) = X.

Proof. Since the random variable XE(Y |G) is G-measurable it suces toshow that E [IAXE(Y |G)] = E(IAXY ) for all A ∈ G. Let the range of Xbe x1, x2, . . . , xn and set Aj = X = xj. Then Aj ∈ G and

IAX =

n∑j=1

xjIAAj .

By linearity,

E [IAXE(Y |G)] =

n∑j=1

xjE[IAAjE(Y |G)

]=

n∑j=1

xjE(IAAjY ) = E(IAXY ),

which veries the rst assertion of the theorem. The last assertion follows bytaking Y = 1.

The next theorem shows that if the information provided by G is indepen-dent of that provided by X then the best predictor of X given G is the sameas when no information is given.

Theorem 8.3.3 (Independence Property). Let X be a random variable on(Ω,F ,P) and G a σ-eld contained in F . If X is independent of G, that is, ifX and IA are independent for all A ∈ G, then E(X|G) = E(X).

Proof. Obviously EX is G-measurable, and by independence

E(IAX) = (E IA)(EX) = E[IAE(X)]

for all A ∈ G.

The following theorem asserts that successive predictions of X based onnested levels of information produce the same result as a single predictionusing the least information.

Theorem 8.3.4 (Iterated Conditioning Property). Let X be a random vari-able on (Ω,F ,P) and let G and H be σ-elds with H ⊆ G ⊆ F . Then

E [E(X|G)|H] = E(X|H).


Proof. Let Y = E(X|G). We need to show that E(Y |H) = E(X|H), that is,

E [IAE(Y |H)] = E(IAX) for all A ∈ H.

But, by the dening property of conditional expectation with respect to H,the last equation is simply E(IAY ) = E(IAX), which holds by the deningproperty of conditional expectation with respect to G.

8.4 Discrete-Time Martingales

Denition 8.4.1. A stochastic process (Mn) = (Mn)Nn=0 on (Ω,F ,P) adaptedto a ltration (Fn)Nn=0 is said to be a

(P, (Fn)

)-martingale if

E(Mn+1|Fn) = Mn, n = 0, 1, . . . , N − 1.1 (8.5)

If there is no possibility of ambiguity we will drop one or both of the compo-nents of the prex

(P, (Fn)

). For the special case Fn = σ(M0, . . .Mn), we will

omit reference to the ltration.

Remarks 8.4.2. (a) Since Mn is Fn-measurable we can write (8.5) as

E(Mn+1 −Mn|Fn) = 0, n = 0, 1, . . . , N − 1.

This has the following gambling interpretation: LetMn represent the winningsof a gambler on the nth play of a game consisting of N plays. A fair gamerequires that the best prediction of the gain Mn+1 −Mn on the next play,based on the information obtained during the rst n plays, is zero.

(b) By (8.5) and the iterated conditioning property, martingales satisfythe following multistep property :

E(Mm|Fn) = Mn, 0 ≤ n ≤ m ≤ N.

Example 8.4.3. Let X0, X1, . . . , XN be a sequence of independent randomvariables on (Ω,F ,P) with mean 1 and set Mn = X0X1 · · ·Xn. By the factorand independence properties,

E(Mn+1 −Mn|M0,M1, . . . ,Mn) = MnE(Xn+1 − 1|M0,M1, . . . ,Mn)

= MnE(Xn+1 − 1)

= 0.

Therefore, (Mn) is a martingale.

1A martingale may begin at indices other than n = 0.


Example 8.4.4. Let X1, . . . , XN be a sequence of independent random vari-ables on (Ω,F ,P) with mean p and set Mn = X1 + X2 + · · · + Xn − np. Bythe independence property,

E(Mn+1 −Mn|M1, . . . ,Mn) = E(Xn+1 − p|M1, . . . ,Mn)

= E(Xn+1 − p)= 0,

hence (Mn) is a martingale.

Example 8.4.5. Let X be a random variable on (Ω,F ,P) and let (Fn) bea ltration on Ω. Dene Mn = E(X|Fn), n = 0, 1, . . . , N . By the iteratedconditioning property,

E(Mn+1|Fn) = E [E(X|Fn+1)|Fn] = E(X|Fn) = Mn.

Therefore, (Mn) is an (Fn)-martingale.

The following theorem asserts that reducing the amount of informationprovided by a ltration preserves the martingale property. (The same is notnecessarily true if information is increased.)

Theorem 8.4.6. Let Gn and Fn be ltrations with Gn ⊆ Fn ⊆ F , n =0, 1, . . . , N . If (Mn) is adapted to (Gn) and is an (Fn)-martingale, then it isalso a (Gn)-martingale. In particular, an (Fn)-martingale M = (Mn) is an(FMn )-martingale.

Proof. This follows from the factor and iterated conditioning properties:

E(Mn+1|Gn) = E [E(Mn+1|Fn)|Gn] = E(Mn|Gn) = Mn.

The proof of the next theorem is left to the reader.

Theorem 8.4.7. If (Mn) and (M ′n) are (Fn)-martingales and α, α′ ∈ R, then(αMn + α′M ′n) is a (Fn)-martingale.


8.5 Exercises

1. Let X and Y be discrete random variables. For pX

(x) > 0, dene

g(x) =∑y

pX,Y

(x, y)

pX

(x)y,

where the sum is taken over all y for which pX,Y

(x, y) > 0. If pX

(x) = 0,g(x) may be dened arbitrarily. Show that E(Y |X) = g(X).

2. Let G be a σ-eld contained in F , X a G-random variable, and Y anF-random variable independent of G. Show that E(XY ) = E(X)E(Y ).

Hint: Condition on G.

3. Let G be a σ-eld contained in F , X a G-random variable, and Y anF-random variable with X − Y independent of G. Show that, if eitherEX = 0 or EX = EY , then

E(XY ) = EX2 and E(Y −X)2 = EY 2 − EX2.

4. Prove Theorem 8.4.7.

5. Verify the multistep property of Remark 8.4.2(b).

6. Show that if (Mn) is a martingale then E(Mn) = E(M0), n = 1, 2, . . . , N .

7. Let X = (Xn) be a sequence of independent random variables on(Ω,F ,P) with mean 0 and variance σ2, and let Yn := X1 +X2 +· · ·+Xn.Show that

Mn := Y 2n − nσ2, n = 1, 2, . . . , N

denes an (FXn )-martingale.

8. Let X = (Xn) be a sequence of independent random variables on(Ω,F ,P) with

P(Xn = 1) = p and P(Xn = −1) = q := 1− p,

and set Yn =∑nj=1Xj . For a > 0 dene

Mn := eaYn(pea + qe−a

)−n, n = 1, 2, . . . N.

Show that (Mn) is an (FXn )-martingale.

9. Let (Xn), (Yn) and (Fn) be as in Exercise 8, 0 < p < 1, and r := qp−1.Show that

(rYn)is an (FXn )-martingale.


10. Let (Xn) and (Yn) be sequences of independent random variables on(Ω,F ,P), each adapted to a ltration (Fn), such that, for each n ≥ 1,Xn and Yn are independent of each other and also of Fn−1, where F0 =∅,Ω. Suppose also that E(Xn) = E(Yn) = 0 for all n. Set

An = X1 +X2 + · · ·+Xn and Bn = Y1 + Y2 + · · ·+ Yn.

Show that (AnBn) is an (Fn)-martingale.

11. Let (An)Nn=0 and (Bn)Nn=0 be (Fn)Nn=0-martingales on (Ω,F ,P) and letCn = A2

n −Bn. Show that

E[(Am −An)2|Fn

]= E [Cm − Cn|Fn] , 0 ≤ n ≤ m ≤ N.

12. Let (Xn) be a sequence of independent Bernoulli random variables withparameter p and set Yk = X1 +X2 + · · ·+Xk. For all cases nd(a) E(Ym|Yn), (b) E(Xj |Yn), (c) E(Xk|Xj), and (d) E(Yn|Xj).

Hint: For (a) and m < n, use Exercises 1, 3.12, and 6.19.

13. Let (Mn) be an (Fn)-martingale on (Ω,F ,P). Show that

E[(Mn −Mm)Mk] = 0, 0 ≤ k ≤ m ≤ n.

14. (Doob decomposition). Let (Fn)Nn=0 be a ltration on (Ω,F ,P) and(Xn)Nn=0 an adapted process. Dene

A0 = 0, and An = An−1 + E(Xn −Xn−1|Fn−1), n = 1, 2, . . . , N.

Show that, with respect to (Fn), (An) is predictable and (Xn − An) isa martingale.


Chapter 9

The Binomial Model Revisited

In this chapter, we give a martingale interpretation of the main results ofSection 7.2. This will suggest an approach to option pricing that can be appliedto general nite models. We also determine the proper price of an Americanclaim, describe optimal strategies for both the writer and the holder of anAmerican put, and consider the eect of dividends in the binomial model.

9.1 Martingales in the Binomial Model

Recall that in the binomial model the security S is assumed each periodto go up by a factor of u with probability p or down by a factor of d withprobability q = 1− p. The price of S at time n is given by

Sn = S0uYndn−Yn = S0d

n(ud

)Yn, (9.1)

where Yn := X1 + X2 + · · · + Xn, the Xj 's independent Bernoulli randomvariables with parameter p on the probability space (Ω,P,F) constructed inSection 7.1. We model the ow of information by the ltration (FSn )Nn=0, where

FSn = σ(S0, S1, . . . , Sn) = σ(X1, . . . , Xn) = FXn .

(See Exercise 7.3.) All martingales considered in this chapter are relative tothis ltration. Note that FSN = F , the σ-eld of all subsets of Ω.

We assume throughout that 0 < d < 1 + i < u, where i is the interestrate per period. By Corollary 7.2.4, this is equivalent to the property that thebinomial model is arbitrage-free. As in Section 7.2, P∗ denotes the risk-freeprobability measure on Ω dened by the probability vector

(p∗, q∗) :=

(1 + i− du− d ,

u− 1− iu− d

).

Recall that for a stochastic process (Xn), the discounted process X isdened by Xn = (1 + i)−nXn. The following two theorems provide the keyconnection between Chapters 7 and 8.

101


Theorem 9.1.1. The discounted stock price process (Sn)Nn=0 is a P∗-martingale.

Proof. Let v = u/d. Since Sn+1 = dSnvXn+1 , the factor and independence

properties of conditional expectation imply that

E∗(Sn+1|FSn ) = dSnE∗(vXn+1 |FSn

)= dSnE∗ vXn+1

= d (p∗v + q∗)Sn

= (1 + i)Sn.

Dividing by (1 + i)n+1, we obtain the martingale property E∗(Sn+1|FSn ) =Sn.

Recall that the value process V = (Vn)Nn=0 of a self-nancing portfolioprocess (φ, θ) satises

Vn+1 = θn+1Sn+1 + (1 + i)[Vn − θn+1Sn], n = 0, 1, . . . , N − 1 (9.2)

(Theorem 5.2.5). Moreover, the binomial model is complete: given any contin-gent claim H there exists a (unique) self-nancing portfolio with value processV such that VN = H (Corollary 7.2.4).

Theorem 9.1.2. The discounted value process (Vn := (1 + i)−nVn)Nn=0 of aself-nancing portfolio is a P∗-martingale. Thus,

Vn = (1 + i)n−mE∗(Vm|FSn

), 0 ≤ n ≤ m ≤ N. (9.3)

In particular, the time-n value of a contingent claim H is

Vn = (1 + i)n−NE∗(H|FSn

), 0 ≤ n ≤ N.

Proof. It is clear from the denition of value process that V is adapted to theltration (FSn ). Moreover, from (9.2), the predictability of the process θ andTheorem 9.1.1, we have

E∗(Vn+1|FSn ) = θn+1E∗(Sn+1|FSn ) + (1 + i)(Vn − θn+1Sn)

= θn+1(1 + i)Sn + (1 + i)(Vn − θn+1Sn)

= (1 + i)Vn.

Dividing by (1 + i)n+1 shows that V is a martingale. Equation (9.3) followsfrom the multistep property, and the last assertion of the theorem is a conse-quence of (9.3) and Theorem 5.3.3.

Combining Theorem 9.1.2 and Example 8.2.4, we have the following result,which will be needed in Section 9.3.

The Binomial Model Revisited 103

Corollary 9.1.3. Let 0 ≤ n < m ≤ N If Vm = f(Sm) for some function f ,then

Vn = (1 + i)−kE∗(f(Sm)|FSn

)= (1 + i)−k

k∑j=0

(k

j

)p∗jq∗k−jf

(ujdk−jSn

), k := m− n.

9.2 Change of Probability

Theorem 9.1.2 expresses Vn as a conditional expectation relative to therisk-neutral probability measure P∗. It is also possible to express Vn as aconditional expectation relative to the original probability measure P. This isthe content of the following theorem.

Theorem 9.2.1. There exists a positive random variable Z on (Ω,F ,P) withE(Z) = 1 such that

Vn = (1 + i)n−mZ−1n E(ZmVm|FSn ), 0 ≤ n ≤ m ≤ N,

where Zn := E(Z|FSn ).

We give a proof that may be applied to more general settings. The core ofthe proof consists of the following three lemmas.

Lemma 9.2.2. There exists a unique positive random variable Z on (Ω,F ,P)with EZ = 1 such that, for any random variable X,

E∗X = E(XZ). (9.4)

In particular,P∗(A) = E(IAZ). (9.5)

Proof. Dene Z by

Z(ω) =P∗(ω)

P(ω).

Then, for any random variable X on (Ω,F ,P),

E(XZ) =∑ω

X(ω)Z(ω)P(ω) =∑ω

X(ω)P∗(ω) = E∗(X).

That Z is unique follows from the observation that if Z is a random variablesatisfying (9.4), then, taking A = ω in (9.5), we have P∗(ω) = Z(ω)P(ω).


The random variable Z in Lemma 9.2.2 is called the Radon-Nikodymderivative of P∗ with respect to P and is denoted by d P∗

d P . It provides a con-nection between the expectations E∗ and E. The next lemma shows that Zprovides an analogous connection between the corresponding conditional ex-pectations.

Lemma 9.2.3. For any σ-elds H ⊆ G ⊆ F and any G-random variable X,

E∗(X|H) =E(XZ|H)

E(Z|H). (9.6)

Proof. Let Y = E(Z|H). We show rst that Y > 0. Let A = Y ≤ 0. ThenA ∈ H and

E(IAY ) = E(IAZ) = E∗(IA) = P∗(A),

where we have used the dening property of conditional expectations in therst equality and the dening property of the Radon-Nikodym derivative inthe second. Since IAY ≤ 0, P∗(A) = 0. Therefore, A = ∅ and Y > 0.

To verify Equation (9.6), we show that the H-random variable U :=Y −1E(XZ|H) has the dening property of conditional expectation with re-spect to P∗, namely,

E∗(IAU) = E∗(IAX) for all A ∈ H.

Using the factor property, we have

E∗(IAU) = E(IAUZ) = E [E(IAUZ|H)] = E (IAUY )

= E [IAE(XZ|H)] = E(IAXZ) = E∗(IAX),

as required.

The following result is a martingale version of the preceding lemma.

Lemma 9.2.4. Given a ltration (Gn)Nn=0 contained in F set

Zn = E(Z|Gn), n = 0, 1, . . . , N.

Then (Zn) is a martingale with respect to (Gn), and for any Gm-random vari-able X,

E∗(X|Gn) = Z−1n E(XZm|Gn), 0 ≤ n ≤ m ≤ N.

Proof. That (Zn) is a martingale is the content of Example 8.4.5. By theiterated conditioning and factor properties, we have for m ≥ n

E(XZ|Gn) = E [E(XZ|Gm)|Gn] = E [XE(Z|Gm)|Gn] = E [XZm|Gn] .

Applying Lemma 9.2.3 with G = Gm and H = Gn, we see that

E∗(X|Gn) =E(XZ|Gn)

E(Z|Gn)=

E(XZm|Gn)

Zn.


For the proof Theorem 9.2.1, take X = Vm and (Gn) = (FSn ) inLemma 9.2.4 so that

E∗(Vm|FSn ) = Z−1n E(VmZm|FSn ), 0 ≤ n ≤ m ≤ N.

The theorem now follows from Equation (9.3).

Remark 9.2.5. The proofs of Lemmas 9.2.2, 9.2.3, and 9.2.4 are completelygeneral, valid for any nite sample space and any pair of probability measuresP and P∗ that are equivalent, that is, satisfy P(ω) > 0 i P∗(ω) > 0. In thisgeneral setting, one also has the following converse to Lemma 9.2.2: Givena positive random variable Z with EZ = 1, the equation P∗(A) = E(IAZ)denes a probability measure such that (9.4) holds for any random variableX.

9.3 American Claims in the Binomial Model

In Theorem 7.2.1, we constructed a hedge that allows the writer of a Eu-ropean claim to cover her obligation at maturity N . In this section we devisea hedging strategy for the writer of an American claim. Such a hedge is morecomplex, as it must cover the writer's obligation at any time n ≤ N .

We assume the payo at time n is of the form f(Sn), where f(x) is anonnegative function. (For example in the case of an American put, f(x) =(K − x)+.) At time N , the portfolio needs to cover the amount f(SN ) hencethe value process (Vn) of the hedge must satisfy

VN = f(SN ).

At time N − 1, there are two possibilities: If the claim is exercised, then theamount f(SN−1) must be covered, so, in this case, we need

VN−1 ≥ f(SN−1).

If the claim is not exercised, then the portfolio must have a value sucient tocover the claim VN at time N . By risk-neutral pricing, that value is

(1 + i)−1E∗(VN |FSN−1).

Therefore, in this case, we should have

VN−1 ≥ (1 + i)−1E∗(VN |FSN−1).

We can satisfy both cases in an optimal way by requiring that

VN−1 = max(f(SN−1), (1 + i)−1E∗(VN |FSN−1)

).


The same argument may be made at each stage of the process. This leads tothe backward recursion formula

VN = f(SN ),

Vn = maxf(Sn), (1 + i)−1E∗(Vn+1|FSn ), n = N − 1, N − 2, . . . , 0. (9.7)

The process V so dened may be used to construct a self-nancing tradingstrategy exactly as in the proof of Theorem 7.2.1. Thus, we have

Theorem 9.3.1. The process V = (Vn) dened by (9.7) is the value pro-cess for a self-nancing portfolio with trading strategy (φ, θ), where for n =1, 2, . . . , N and ω = (ω1, ω2, . . . , ωn−1),

θn(ω) =Vn(ω, u)− Vn(ω, d)

Sn(ω, u)− Sn(ω, d), and

φn(ω) = (1 + i)1−n(Vn−1(ω)− θnSn−1(ω)).

The portfolio covers the claim at any time n, that is, VN = f(SN ) and Vn ≥f(Sn), n = 1, 2, . . . , N − 1. Hence, the proper price of the claim is the initialcost V0 of setting up the portfolio.

Remark 9.3.2. In contrast to the case of a European claim, the discountedvalue process V is not a martingale. However, it follows from (9.7) that Vn ≥(1 + i)−1E∗(Vn+1|FSn ), and multiplying this inequality by (1 + i)−n yields

Vn ≥ E∗(Vn+1|FSn ), 0 ≤ n < N.

A process satisfying such an inequality is called a supermartingale. We willreturn to this notion in the next section.

The following theorem gives an algorithm for constructing V based directlyon the backward recursion scheme (9.7).

Theorem 9.3.3. Let (vn) be the sequence of functions dened by setting

vN (s) = f(s), and

vn(s) = max(f(s), avn+1(us) + bvn+1(ds)

), n = N − 1, . . . , 0, (9.8)

where a = (1 + i)−1p∗ and b = (1 + i)−1q∗. Then

Vn = vn(Sn), n = 1, 2, . . . , N.

In particular, for each ω,

Vn(ω) = max(f(ukdn−kS0

), avn+1

(uk+1dn−kS0

)+ bvn+1

(ukdn+1−kS0

)),

where k := Yn(ω) is the number of upticks during the rst n time periods.


Proof. Clearly, VN = vN (SN ). Suppose Vn+1 = vn+1(Sn+1). By Corol-lary 9.1.3 with m = n+ 1,

(1 + i)−1E∗(Vn+1|FSn ) = avn+1(uSn) + bvn+1(dSn).

Substituting this expression into (9.7), we see that Vn = vn(Sn). The conclu-sion of the theorem now follows by backward induction.

For small values of N , the algorithm (9.8) may be readily implemented ona spreadsheet. The following example illustrates the case N = 4.

Example 9.3.4. For N = 4, (9.8) may be explicitly rendered as

v4(s) = f(s)v3(u3S0) = max f(u3S0), av4(u4S0) + bv4(u3dS0) v3(u2dS0) = max f(u2dS0), av4(u3dS0) + bv4(u2d2S0) v3(ud2S0) = max f(ud2S0), av4(u2d2S0) + bv4(ud3dS0) v3(d3S0) = max f(d3S0), av4(ud3S0) + bv4(d4S0) v2(u2S0) = max f(u2S0), av3(u3S0) + bv3(u2dS0) v2(udS0) = max f(udS0), av3(u2dS0) + bv3(ud2S0) v2(d2S0) = max f(d2S0), av3(ud2S0) + bv3(d3S0) v1(uS0) = max f(uS0), av2(u2S0) + bv2(udS0) v1(dS0) = max f(dS0), av2(udS0) + bv2(d2S0) v0(S0) = max f(S0), av1(uS0) + bv1(dS0)

The price V0 = v0(S0) of an American put may be calculated using thisscheme. Table 9.1 gives American put prices P a0 for S0 = $20.00, i = .10, andfor various values of K, u, and d.

K u d P a0 K u d P a0

$18.00 1.5 .9 $0.76 $22.00 1.5 .9 $2.43$18.00 2.0 .9 $1.49 $22.00 2.0 .9 $3.21$18.00 1.5 .6 $3.18 $22.00 1.5 .6 $5.45$18.00 2.0 .6 $4.90 $22.00 2.0 .6 $6.79

TABLE 9.1: American put prices for S0 = $20

Additional material on American claims in the binomial model, includinga version of the hedge that allows consumption at each time n, may be foundin [16].


9.4 Stopping Times

We have shown how the writer of an American claim may construct ahedge to cover her obligation at any exercise time n ≤ N . The holder of theclaim has a dierent concern, namely, when to exercise the claim to obtainthe largest possible payo. In this section, we develop the tools needed todetermine the holder's optimal exercise time.

It is clear that the optimal exercise time of a claim depends on the priceof the underlying asset at that time and therefore must be a random variable.Moreover, its value must be determined using only present or past information.This leads to the formal notion of a stopping time.

Denition 9.4.1. Let (Fn) = (Fn)Nn=0 be a ltration on Ω. An (Fn)-stoppingtime is a random variable τ with values in the set 0, 1, . . . , N such that

τ = n ∈ Fn, n = 0, 1, . . . , N.

If there is no possibility of ambiguity, we omit reference to the ltration.

Note that if τ is a stopping time, then the set τ ≤ n, as a union of the setsτ = j ∈ Fj , j ≤ n, is a member of Fn. It follows that τ ≥ n+1 = τ ≤ n′also lies in Fn.Example 9.4.2. The rst time a stock falls below a value a or exceeds avalue b is described mathematically by the formula

τ(ω) =

minn | Sn(ω) ∈ A if n | Sn(ω) ∈ A 6= ∅N otherwise,

where A = (∞, a) ∪ (b,∞). That τ is a stopping time relative to the naturalltration of (Sn) may be seen from the calculations

τ = n = Sn ∈ A ∩n−1⋂j=0

Sj 6∈ A, n < N

and

τ = N =

N−1⋂j=0

Sj 6∈ A.

For a related example, consider a process (In), say a stock market indexlike the S&P 500 or the Nikkei 225, and a ltration (Fn) to which both (Sn)and (In) are adapted. Dene

τ(ω) =

minn | Sn(ω) > In(ω) if n | Sn(ω) > In(ω) 6= ∅N otherwise.


Then τ is a stopping time, as may be seen from calculations similar to thoseabove. Such a stopping time could result from an investment decision to sellthe stock the rst time it exceeds the index value.

It is easy to see that the function

τ(ω) =

maxn | Sn(ω) ∈ A if n | Sn(ω) ∈ A 6= ∅N otherwise

is not a stopping time. This is a mathematical formulation of the obvious factthat without foresight (or insider information) an investor cannot determinewhen the stock's value will lie in a set A for the last time.

The constant function τ = m, where m is a xed integer in 0, 1, . . . , N,is easily seen to be a stopping time. Also, if τ and σ are stopping times, thenso is τ ∧ σ, where

(τ ∧ σ)(ω) := min(τ(ω), σ(ω)).

This follows immediately from the calculation

τ ∧ σ = n = τ = n, σ ≥ n ∪ σ = n, τ ≥ n.

In particular, τ ∧m is a stopping time. This observation leads to the notion ofa stopped process, an essential tool in determining the optimal time to exercisea claim. To dene such a process, we need the notion of an optional stoppingof a process.

Denition 9.4.3. Let (Xn)Nn=0 be a stochastic process adapted to a ltration(Fn)Nn=0 and let τ be a stopping time. The random variable Xτ dened by

Xτ =

N∑j=0

Iτ=jXj

is called an optional stopping of the process X.

From the denition, Xτ (ω) = Xj(ω) for any ω for which τ(ω) = j. It follows

immediately that if Xj ≤ Yj for all j then Xτ ≤ Yτ and Xτ ≤ Yτ , where

Xτ := (1 + i)−τXτ .

Denition 9.4.4. For a given stopping time τ , the stochastic process(Xτ∧n)Nn=0 is called the stopped process for τ .

For example, if τ is the rst time the stock's value reaches a specied valuea, then Sτ = a, and the path of the stopped price process for a scenario ω is

S0(ω), S1(ω), . . . , Sτ−1(ω), a.

To nd the optimal exercise time for an American put, we shall need thefollowing generalization of a martingale:


Denition 9.4.5. A stochastic process (Mn) adapted to a ltration (Fn) issaid to be a

(P, (Fn)

)-supermartingale, respectively, -submartingale, if

E(Mn+1|Fn) ≤Mn, respectively, E(Mn+1|Fn) ≥Mn, n = 0, 1, . . . , N − 1.

When there is no ambiguity, we will drop one or both of the components of theprex

(P, (Fn)

). If Fn = σ(M0, . . .Mn), we omit reference to the ltration.

Note that (Mn) is a supermartingale i (−Mn) is a submartingale, and (Mn)is a martingale i it is both a supermartingale and a submartingale. Moreover,a supermartingale has the multistep property

E(Mm|Fn) ≤Mn, 0 ≤ n < m ≤ N,

as may be seen by iterated conditioning. Submartingales have the analogousproperty with the inequality reversed.

For a gambling interpretation, suppose that Mn denotes a gambler's ac-cumulated winnings at the completion of the nth game. The submartingaleproperty then asserts that the game favors the player, since the best predictionof his gain Mn+1 −Mn in the next game, relative to the information accruedin the rst n games, is nonnegative. Similarly, the supermartingale propertydescribes a game that favors the house.

Proposition 9.4.7 below implies that a fair (unfair) game is still fair (unfair)when stopped according to a rule that does not require prescience. For itsproof, we require the following lemma.

Lemma 9.4.6. Let (Yn) be a process adapted to a ltration (Fn) and let τ bea stopping time. Then the stopped process (Yτ∧n) is adapted to (Fn) and

Yτ∧(n+1) − Yτ∧n = (Yn+1 − Yn)In+1≤τ. (9.9)

Proof. Note rst that

Yτ∧(n+1) = Y0 +

n+1∑j=1

(Yj − Yj−1)Ij≤τ, 0 ≤ n < N. (9.10)

Indeed, because of the indicator functions, the sum on the right in (9.10) maybe written

τ∧(n+1)∑j=1

(Yj − Yj−1),

which collapses to Yτ∧(n+1) − Y0. Since the terms on the right in (9.10) areFn+1-measurable, (Yτ∧n) is an adapted process. Subtracting from (9.10) theanalogous equation with n replaced by n− 1 yields (9.9).

Proposition 9.4.7. If (Mn) is a (Fn)-martingale (supermartingale, sub-martingale) and if τ is an (Fn)-stopping time, then the stopped process (Mτ∧n)is a martingale (supermartingale, submartingale).


Proof. By Lemma 9.4.6, (Mτ∧n) is adapted to (Fn). Also, from (9.9),

Mτ∧(n+1) −Mτ∧n = (Mn+1 −Mn)In+1≤τ,

hence, by the Fn-measurability of In+1≤τ,

E(Mτ∧(n+1) −Mτ∧n|Fn

)= In+1≤τE(Mn+1 −Mn|Fn). (9.11)

The conclusion of the proposition is immediate from (9.11). For exam-ple, if (Mn) is a supermartingale, then E(Mn+1 − Mn|Fn) ≤ 0 henceE(Mτ∧(n+1)|Fn

)≤Mτ∧n.

9.5 Optimal Exercise of an American Claim

In Section 9.3 we found that the writer of an American claim can coverher obligations with a hedge whose value process V is given by the backwardrecursive scheme

VN = HN , Vn = maxHn, (1 + i)−1E∗(Vn+1|FSn ) = vn(Sn), 0 ≤ n < N,

where Hn := f(Sn) denotes the payo of the claim at time n and the functionsvn are dened as in Theorem 9.3.3. In terms of the corresponding discountedprocesses, we have

VN = HN , Vn = maxHn,E∗(Vn+1|FSn ), 0 ≤ n < N, (9.12)

which expresses (Vn) as the so-called Snell envelope of the process (Hn). Inthis section, we use the value process V to nd the optimal time for the holderto exercise the claim. Specically, we show that the holder should exercise theclaim the rst time Hn = Vn.

For m = 0, 1, . . . , N , let Tm denote the set of all (FSn )-stopping times withvalues in the set m,m+ 1, . . . , N. Dene τm ∈ Tm by

τm = minj ≥ m | Vj = Hj = minj ≥ m | Vj = Hj.

Note that τm is well dened since VN = HN .Recall that, because V is a (FSn )-supermartingale (Remark 9.3.2), so is the

stopped process (Vτ∧n), where

Vτ∧n = (1 + i)−τ∧nVτ∧n

(Proposition 9.4.7). The following lemma asserts that, for the special stoppingtime τ = τm, much more can be said.

Lemma 9.5.1. For each m = 0, 1, . . . , N−1, the stopped process (Vτm∧n)Nn=m

is a (FSn )Nn=m-martingale.


Proof. By Lemma 9.4.6, (Vτm∧n) is adapted to the ltration and

Vτm∧(n+1) − Vτm∧n = (Vn+1 − Vn)In+1≤τm.

Therefore, it suces to show that, for m ≤ n < N ,

E∗[(Vn+1 − Vn)In+1≤τm|FSn

]= 0.

Since In+1≤τm and Vn are FSn -measurable, the last equation is equivalent to

In+1≤τmE∗(Vn+1|FSn

)= In+1≤τmVn. (9.13)

To verify (9.13), x ω ∈ Ω and consider two cases: If τm(ω) < n + 1, thenthe indicator functions are zero, hence (9.13) is trivially satised. If, on theother hand, τm(ω) ≥ n+ 1, then Vn(ω) 6= Hn(ω), hence, from (9.12), Vn(ω) =E∗(Vn+1|FSn )(ω). Therefore, (9.13) holds at each ω.

The following theorem is the main result of the section. It asserts that, aftertime m, the optimal time to exercise an American claim is τm, in the sensethat the largest expected discounted payo, given the available information,occurs at that time.

Theorem 9.5.2. For any m ∈ 0, 1, . . . , N,

E∗(Hτm |FSm) = maxτ∈Tm

E∗(Hτ |FSm) = Vm.

In particular,E∗Hτ0 = max

τ∈T0E∗Hτ = V0.

Proof. Since (Vτm∧n)Nn=m is a martingale (Lemma 9.5.1) and Vτm = Hτm ,

Vm = Vτm∧m = E∗(Vτm∧N |FSm) = E∗(Vτm |FSm) = E∗(Hτm |FSm).

Now let τ ∈ Tm. Since (Vτ∧n) is a supermartingale (Proposition 9.4.7) andVτ ≥ Hτ ,

Vm = Vτ∧m ≥ E∗(Vτ∧N |FSm) = E∗(Vτ |FSm) ≥ E∗(Hτ |FSm).

Therefore, Vm = maxτ∈Tm E∗(Hτ |FSm), completing the proof.

Remark 9.5.3. For the case m = 0, Theorem 9.5.2 asserts that it is optimalto exercise an American claim the rst time f(Sn) = vn(Sn), where

vN (s) = f(s),

vn(s) = max [f(s), avn+1(us) + bvn+1(ds)] , n = 0, 1 . . . , N − 1


(see Theorem 9.3.3). This leads to the following simple algorithm which maybe used to nd τ0(ω) for any scenario ω = (ω1, ω2, . . . , ωN ):

if f(S0) = V0, τ0(ω) = 0,else if f(S0ω1) = v1(S0ω1), τ0(ω) = 1,else if f(S0ω1ω2) = v2(S0ω1ω2), τ0(ω) = 2,

......

else if f(S0ω1 · · ·ωN−1) = vN−1(S0ω1 · · ·ωN−1), τ0(ω) = N − 1,else τ0(ω) = N.

The algorithm also gives the stopped scenarios for which τ0 = 1, τ0 = 2,and so forth. For small values of N , the algorithm is readily implementedon a spreadsheet by comparing the values of f and v along paths. The nextexample illustrates this for the case N = 4.

Example 9.5.4. Consider an American put that matures in four periods,where S0 = $10.00 and i = .1. Table 9.2 gives optimal exercise scenariosand payos (displayed parenthetically) for various values of K, u, and d. Thefourth column gives the prices P a0 of the put. In row 2, for example, we see

K u d P a0 Optimal Stopping Scenarios and Payos

20 3 .3 $13.49 d (17.00); udd (17.30); uudd, or udud (11.90)20 2 .3 $11.77 d (17.00); ud (14.00); uud (8.00)12 3 .3 $7.01 d (9.00); udd (9.30); uudd, or udud (3.90)12 3 .6 $5.13 dd (8.40); dudd, or uddd (5.52)12 2 .3 $6.05 d (9.00); udd (10.20); uudd, or udud (8.40)12 2 .6 $4.04 d (6.00); udd (4.80)8 3 .3 $4.07 dd (7.10); dud, or udd (5.30)8 3 .6 $2.50 ddd (5.84); uddd, dudd, or ddud (1.52)

TABLE 9.2: Put Prices and Stopping Scenarios

that the claim should be exercised after one time unit if the stock rst goesdown, after two time units if the stock goes up then down, and after threetime units if the stock goes up twice in succession then down.

Scenarios missing from the table are either not optimal or result in zeropayos. For example, in row 1, it is never optimal to exercise at time 2 andthe missing optimal scenarios uuu, uudu, uduu all have zero payos. In row4, it is never optimal to exercise at time 1 and the missing optimal scenariosuu, udu, duu, uddu, dudu all have zero payos. Note that, if an optimalscenario ω1, ω2, . . . , ωn has a zero payo at time n, then there is no hope ofever obtaining a nonzero payo; all later scenarios ω1, ω2, . . . , ωn, ωn+1, . . . willalso have zero payos (see Exercise 5).


For additional material regarding stopping times and American claims inthe binomial model, including the case of path-dependent payos, see, forexample, [16].

9.6 Dividends in the Binomial Model

So far in this chapter, we have assumed that our stock S pays no dividends.In this case, the binomial price process (Sn) satises the recursion equation

Sn+1 = Zn+1Sn, Zn+1 := d(ud

)Xn+1

,

where (Xn) is the Bernoulli process dened in Section 7.1. The value processV of a self-nancing portfolio based on the stock may then be expressed as

Vn+1 = θn+1Zn+1Sn + (1 + i)(Vn − θn+1Sn), n = 0, 1, . . . , N − 1.

Now suppose that at each of the times n = 1, 2, . . . , N the stock pays adividend that is a fraction δn ∈ (0, 1) of the value of S at that time. Weassume that δn is a random variable and that the process (δn)Nn=1 is adaptedto the price process ltration. An arbitrage argument shows that after thedividend is paid the value of the stock is reduced by exactly the amount ofthe dividend (see Section 4.8). Thus, at time n+ 1, just after payment of thedividend δn+1Zn+1Sn, the value of the stock becomes

Sn+1 = (1− δn+1)Zn+1Sn, n = 0, 1, . . . , N − 1. (9.14)

Since dividends contribute to the portfolio, the value process must satisfy

Vn+1 = θn+1Sn+1 + (1 + i)(Vn − θn+1Sn) + θn+1δn+1Zn+1Sn

= θn+1(1− δn+1)Zn+1Sn + (1 + i)(Vn − θn+1Sn) + θn+1δn+1Zn+1Sn

= θn+1Zn+1Sn + (1 + i)(Vn − θn+1Sn). (9.15)

Thus, the value process in the dividend-paying case satises the same recur-sion equation as in the non-dividend-paying case. Since the proof of Theo-rem 7.2.1 relies only on this equation, the conclusion of that theorem holdsin the dividend-paying case as well. In particular, given a European claim H,there exists a unique self-nancing trading strategy (φ, θ) with value processV such that VN = H.

One easily checks that in the dividend-paying case the discounted priceprocess is no longer a martingale. (In this connection, see Exercise 6.) Never-theless, as in the non-dividend-paying case, we have


Theorem 9.6.1. The discounted value process (Vn := (1 + i)−nVn)Nn=0 of aself-nancing portfolio (φ, θ) based on a risk-free bond and the dividend-payingstock is a

(P∗, (FSn )

)-martingale. In particular,

Vn = (1 + i)n−mE∗(Vm|FSn

), 0 ≤ n ≤ m ≤ N, (9.16)

and the time-n value of a contingent claim H is


), 0 ≤ n ≤ N.

Proof. Using (9.15), noting that θn+1, Sn, and Vn are FSn -measurable, we have

E∗(Vn+1|FSn ) = θn+1SnE∗(Zn+1|FSn ) + (1 + i)(Vn − θn+1Sn).

Since Zn+1 is independent of FSn ,

E∗(Zn+1|FSn ) = E∗(Zn+1) = up∗ + dq∗ = 1 + i. (9.17)

Therefore, E∗(Vn+1|FSn ) = (1 + i)Vn, and dividing by (1 + i)n+1 shows thatV is a martingale.

9.7 The General Finite Market Model

Many of the ideas in the preceding sections carry over to the case of ageneral stock price process S = (Sn)Nn=0 on an arbitrary nite probabilityspace (Ω,F ,P), where, as in the binomial model, F is the set of all subsetsof Ω. Here, the stock is no longer restricted to only two movements at timen. Martingales may be used eectively to describe option valuation in thisgeneral setting, as illustrated by the following theorem. For its statement,recall that probability measures P∗ and P are equivalent if the measures arepositive at exactly the same outcomes ω (see Remark 9.2.5).

Theorem 9.7.1. If the discounted general price process (Sn) is a P∗-martingale for some probability measure P∗ equivalent to P, then the discountedvalue process (Vn) for any self-nancing portfolio is also a P∗-martingale. Inparticular, the time-n value of a European claim H with VN = H is


), 0 ≤ n ≤ N,

and the fair price of the claim is V0 = (1 + i)−NE∗(H).

Proof. The proof of the rst assertion of the theorem is the same as that ofTheorem 9.1.2, as it depends only on the characterization of self-nancingportfolio given in (9.2).

To establish the second assertion it suces by Theorem 5.3.3 to show


that the existence of P∗ implies that the market is arbitrage-free. To thisend, consider any trading strategy with value process V satisfying V0 = 0and P(VN ≥ 0) = 1. From the martingale property for V and the fact thatP∗(ω) = 0 whenever VN (ω) < 0, we have∑

VN (ω)>0

VN (ω)P∗(ω) = E∗(VN ) = (1 + i)NE∗(V0) = 0.

Since the terms in the sum are nonnegative, VN (ω) > 0 implies that P ∗(ω) = 0and hence also P (ω) = 0. But then,

P(VN > 0) =∑

VN (ω)>0

P(ω) = 0.

Thus, the market is arbitrage-free.

The proof of Theorem 9.7.1 showed that the existence of a probabilitymeasure P∗ with the stated properties implies that the market is arbitrage-free. The converse of this result is also true, although not as easy to prove.The following theorem summarizes these results, providing a solution to partof the general option-pricing problem.

Theorem 9.7.2 (First Fundamental Theorem of Asset Pricing). The marketis arbitrage-free i the discounted price process of the stock is a P∗-martingalefor some probability measure P∗ equivalent to P.

The remaining part of the option-pricing problem is to determine condi-tions under which an arbitrage-free market is complete. The solution is givenby the following theorem.

Theorem 9.7.3 (Second Fundamental Theorem of Asset Pricing). A marketis arbitrage-free and complete i the discounted price process of the stock is aP∗-martingale for a unique probability measure P∗ equivalent to P.

Proofs of these theorems and a detailed discussion of nite market modelsmay be found in [2, 4].


9.8 Exercises

The following exercises refer to a stock S with geometric binomialprice process S.

1. Show that the following are stopping times:

(a) τa := the rst time the stock price exceeds the average of all of itsprevious values;

(b) τb := the rst time the stock price exceeds all of its previous values;

(c) τc := the rst time the stock price exceeds at least one of its previousvalues.

2. Show that the rst time the stock price increases twice in succession isa stopping time.

3. Following the format of Example 9.5.4, nd the prices and optimal ex-ercise time scenarios of an American claim with payos given by thefunction f(x) = x(K − x)+. Use the data S0 = $10.00, K = $12.00,i = .1, u = 1.9, and d = .3.

4. Referring to Section 9.2, dene

Zn =

(p∗

p

)Yn (q∗q

)n−Ynso that, in the notation of Lemma 9.2.2, ZN = Z. Use the fact that(FSn ) = (FXn ) to show directly that

E(Zm|FSn ) = Zn, 1 ≤ n ≤ m.

5. Let ω ∈ Ω and n = τ0(ω). Show that if f(Sn(ω)) = 0 then f(Sk(ω)) = 0for all k ≥ n.

6. Referring to Section 9.6, assume that the process (δn) is predictable withrespect to the price process ltration. Dene

ξn :=

n∏j=1

(1− δj), n = 1, 2, . . . , N.

Show that the process

Sn := (1 + i)−nξ−1n Sn, 0 ≤ n ≤ N,

is a P∗-martingale.


7. Referring to Section 9.6, assume that δn = δ for each n, where 0 < δ < 1is a constant.

(a) Prove the dividend-paying analog of Corollary 9.1.3, namely,

Vn = an−mm−n∑j=0

(m− nj

)p∗jq∗m−n−jf

(bm−nujdm−n−jSn

),

where a := (1 + i), b := 1− δ, and 0 ≤ n ≤ m ≤ N .

(b) Use (a) to show that the cost C0 of a call option in the dividend-paying case is

C0 = S0(1− δ)NΨ(m,N, p)− (1 + i)−NKΨ(m,N, p∗),

where p = (1 + i)−1p∗u and m is the smallest nonnegative integerfor which S0(1− δ)NumdN−m > K.

Chapter 10

Stochastic Calculus

Stochastic calculus extends classical dierential and integral calculus to func-tions with a random component arising from indeterminacy or system noise.The fundamental construct is the Ito integral, whose description and analysis,as well as an explication of it's role in solving stochastic dierential equations(SDEs), are the main goals of this chapter. The principal tool used in deter-mining the solution of an SDE is the Ito-Doeblin formula, a generalization ofthe chain rule of Newtonian calculus. To put the theory in perspective, webegin with a brief discussion of classical dierential equations.

10.1 Dierential Equations

An ordinary dierential equation (ODE) is an equation involving an un-known function and its derivatives. A rst order ODE with initial conditionis of the form

x′ = f(t, x), x(0) = x0, (10.1)

where f is a continuous function of (t, x) and x0 is a given value. The variablet may be thought of as the time variable and x the space variable. A solutionof (10.1) is a dierentiable function x = x(t) satisfying (10.1) on some openinterval I containing 0. Equation (10.1) is frequently written in dierentialform as

dx = f(t, x) dt, x(0) = x0.

Explicit solutions of (10.1) are possible only in special cases. For example,if

f(x, t) =h(t)

g(x),

then (10.1) may be written

g(x)x′(t) = h(t), x(0) = x0

hence a solution x(t) must satisfy G(x(t)) = H(t)+c, where G(x) andH(t) areantiderivatives of g(x) and h(t), respectively, and c is an arbitrary constant.The initial condition x(0) = x0 may then be used to determine c. The result

119


may be obtained formally by writing the dierential equation in separatedform as g(x)dx = h(t)dt and then integrating.

Example 10.1.1. The dierential equation x′(t) = 2t sec(x(t), x(0) = x0, hasseparated form cosx dx = 2t dt, which integrates to sinx = t2 + c, c = sinx0.The solution may be written x(t) = sin−1(t2 + sinx0), which is valid forx0 ∈ (−π/2, π/2) and for t suciently near 0.

Example 10.1.2. Let x(t) be the value at time t of an account earninginterest at the variable rate r(t). For small ∆t, the amount ∆x = x(t+ ∆t)−x(t) earned over the time interval [t, t+ ∆t] is approximately x(t)r(t)∆t. Thisleads to the initial value problem

dx(t) = x(t)r(t)dt, x(0) = x0,

where x0 is the deposit. The solution is x(t) = x0eR(t), where R(t) =∫ t

0r(τ) dτ .

An ODE is inherently deterministic in the sense that the initial conditionx0 and the rate f(t, x) uniquely and completely determine the solution x(t)for all t near 0. There are circumstances, however, under which f(t, x) is notcompletely determined but rather is subject to random uctuations that arethe result of noise in the system. For example, if x(t) is size of an investmentat time t, a model that incorporates random uctuation of the interest rate isgiven by

x′(t) = [r(t) + ξ(t)]x(t), (10.2)

where ξ(t) is a random variable. The same dierential equation arises if x(t)is the size of a population whose relative growth rate is subject to randomuctuations from environmental changes. Because (10.2) has a random com-ponent, one would expect its solution to be a random variable. Equations likethis are called stochastic dierential equations. In this chapter, we attempt tomake these ideas precise and show how to solve such equations.

A partial dierential equation (PDE) is an equation involving an unknownfunction of several variables and its partial derivatives. As we shall see inChapter 11, a stochastic dierential equation can give rise to a PDE whosesolution may be used to construct the solution of the original SDE. Thismethod will be used in Chapter 11 to obtain the Black-Scholes option pricingformula.

10.2 Continuous-Time Stochastic Processes

Recall that a discrete-time stochastic process is a sequence of random vari-ables that models an experiment involving a sequence of trials. While useful

Stochastic Calculus 121

and eective in many contexts, discrete-time models are not always sucientlyrich to capture all the important features of an experiment. Furthermore,discrete-time processes have inherent mathematical limitations, notably theunavailability of calculus techniques. Continuous-time processes oer a morerealistic way to model the dynamics of experiments unfolding in time andallow the introduction of powerful tools from stochastic calculus.

Denition 10.2.1. A (continuous-time) stochastic or random process ona probability space (Ω,F ,P) is a real-valued function X on D × Ω, whereD is an interval of real numbers, such that the function X(t, · ) is an F-random variable for each t ∈ D. The set D is called the index set for theprocess, and the functions X( · , ω), ω ∈ Ω, are the paths of the process. IfX does not depend on ω the process is said to be deterministic. If X1, X2,. . ., Xd are stochastic processes with the same index set, then the d-tupleX = (X1, X2, . . . , Xd) is called a d-dimensional stochastic process.

Depending on context, we will use the notations Xt or X(t) for the randomvariable X(t, · ). We will also denote the process X by (Xt)t∈D or just (Xt),this notation reecting the interpretation of a stochastic process as a collectionof random variables indexed by D and hence as a mathematical descriptionof a random system changing in time. The interval D is usually of the form[0, T ] or [0,∞).

An example of a continuous-time process is the price of a stock, a pathbeing the price history for a particular market scenario. The position at timet of a particle randomly bombarded by the molecules of a liquid in which it issuspended is a three- dimensional stochastic process. Surprisingly, there is animportant connection between these seemingly unrelated examples, one thatwe shall examine later.

As noted above, a continuous-time stochastic process may be viewed as amathematical description of an evolving experiment. Related to this notionis the ow of information revealed by the experiment. As in the discrete-timecase, this evolution of information may be modeled by a ltration.

Denition 10.2.2. A (continuous-time) ltration on (Ω,F , P ) is a collection(Ft)t∈D of σ-elds Ft indexed by members t of an interval D ⊆ R such that

Fs ⊆ Ft ⊆ F for s, t ∈ D and s < t.

A stochastic process (Xt)t∈D is said to be adapted to a ltration (Ft)t∈D if Xt

is a Ft-random variable for each t ∈ D. A d-dimensional process is adaptedto a ltration if each coordinate process is adapted.

As with stochastic processes, we frequently omit reference to the symbol Dand denote the ltration by (Ft).

Associated with each stochastic process X is its natural ltration. It isthe smallest ltration to which X is adapted and consists of time-dependentinformation related solely to X.


Denition 10.2.3. Let (Xt) be a stochastic process. For each index t letFXt = σ(Xs : s ≤ t) denote the intersection of the collection of σ-eldscontaining all events of the form Xs ∈ J, where J is an arbitrary intervaland s ≤ t. (FXt ) is called the natural ltration for (Xt) or the ltrationgenerated by the process (Xt).

An important example of a natural ltration is the ltration generated bya Brownian motion, the fundamental process used in the construction of thestochastic integral. Brownian motion and its natural ltration, described in thenext section, form the basis of the continuous-time pricing models discussedin later chapters.

10.3 Brownian Motion

In 1827, Robert Brown observed that pollen particles suspended in a liquidexhibited highly irregular motion. Later, it was determined that this motionresulted from random collisions of the particles with molecules in the ambi-ent liquid. In 1900, L. Bachelier noted the same irregular variation in stockprices and attempted to describe this behavior mathematically. In 1905, Al-bert Einstein used Brownian motion, as the phenomenon came to be called, inhis work on measuring Avogadro's number. A rigorous mathematical model ofBrownian motion was developed in the 1920s by Norbert Wiener. Since then,the mathematics of Brownian motion and its generalizations has become oneof the most active and important areas of probability theory.

To gain a better appreciation of its denition, it is instructive to view (one-dimensional) Brownian motion as a limit of random walks in the followingsense. Suppose that every ∆t seconds a particle starting at the origin moves

right or left a distance ∆x, each move with probability 1/2. Let Z(n)t denote

the position of the particle at time t = n∆t, that is, after n moves. We assumethat ∆t and ∆x are related by the equation (∆x)2 = ∆t.1 Let Xj = 1 if thejth move is to the right and Xj = 0 otherwise. The Xj 's are independentBernoulli variables with parameter p = 1/2, and Yn :=

∑nj=1Xj ∼ B(p, n) is

the number of moves to the right during the time interval [0, t]. Thus

Z(n)t = Yn∆x+ (n− Yn)(−∆x) = (2Yn − n)∆x =

(Yn − n/2√

n/4

)√t,

1This is needed to produce the desired result that the limit Zt of the random variables

Z(n)t is N(0, t).


the last equality because t = n(∆x)2. By the Central Limit Theorem,

limn→∞

P(Z(n)t ≤ z) = lim

n→∞P

(Yn − n/2√

n/4≤ z√

t

)= Φ

(z√t

).

Thus, as the step length and step time tend to zero via the relation ∆x =√∆t, Z

(n)t tends in distribution to a random variable Zt ∼ N(0, t). A similar

argument shows that Zt−Zs is the limit in distribution of random walks overthe time interval [s, t] and that Zt − Zs ∼ N(0, t− s).

With these ideas in mind we make the following denition:

Denition 10.3.1. Let (Ω,F ,P) be a probability space. A (standard) Brown-ian motion or Wiener process is a stochastic processW on (Ω,F) that satisesthe following conditions:

(a) W0 = 0;

(b) W (t)−W (s) ∼ N(0, t− s), 0 ≤ s < t;

(c) the paths of W are continuous; and

(d) W (t) has independent increments; that is, if 0 < t1 < t2 < · · · < tnthen the random variables W (t1),W (t2) −W (t1), . . . ,W (tn) −W (tn−1)are independent.

The Brownian ltration on (Ω,F ,P) is the natural ltration (FWt )t≥0.

Rigorous proofs of the existence of Brownian motion may be found inadvanced texts on probability theory. The most common of these proofs usesKolmogorov's Extension Theorem; another constructs Brownian motion fromwavelets. The interested reader is referred to [18, 19].

Brownian motion has the unusual property that, while the paths are con-tinuous, they are nowhere dierentiable. This corresponds to Brown's obser-vation that the paths of the pollen particles seemed to have no tangents.Moreover, Brownian motion looks the same at any scale; that is, it is a frac-tal. These properties partially account for the usefulness of Brownian motionin modeling systems with noise.

10.4 Variation of Brownian Paths

A useful way to measure the seemingly erratic behavior of Brownian motionis by the variation of its paths. For this we need the notion of mth variationof a function.


Denition 10.4.1. Let P = a = t0 < t1 < · · · < tn = b be a partition ofthe interval [a, b] and let m be a positive integer. For a real-valued function fon [a, b] dene

V(m)P (f) =

n−1∑j=0

|∆fj |m, ∆fj := f(tj+1)− f(tj).

The function f is said to have bounded (unbounded) mth variation on [a, b]

if the quantities V(m)P (f), taken over all partitions P, form a bounded (un-

bounded) set of real numbers.

Example 10.4.2. The continuous function w on [0, 2/π] dened by

w(t) :=

t sin (1/t) if 0 < t ≤ 2

π,

0 if t = 0

has unbounded rst variation. This can be seen by considering partitions ofthe form

Pn =

0,

2

nπ,

2

(n− 1)π, . . . ,

2

π

and noting that the corresponding sums

∑n−1j=0 |∆wj | are the partial sums of

a divergent series. By contrast, for each ε > 0 the related function

u(t) :=

t1+ε sin (1/t) if 0 < t ≤ 2

π,

0 if t = 0,

has bounded rst variation on every interval [a, b]. (Exercise 2(d).)

If f is a stochastic process, Denition 10.4.1 may be applied to the paths

f( · , ω). In this case, V(m)P (f) is a random variable. In particular, for Brownian

motion, we have the denition

V(m)P (W ) :=

n−1∑j=0

|∆Wj |m.

It may be shown that the paths of Brownian motion have unbounded rstvariation in every interval. (See, for example, [6].)

The following theorem describes the key property of Brownian motion thataccounts for the primary dierence between stochastic calculus and classicalcalculus.

Theorem 10.4.3. Let P = a = t0 < t1 < · · · < tn = b be a partition of theinterval [a, b] and set ||P|| = maxj ∆tj, where ∆tj := tj+1 − tj. Then

lim||P||→0

V(2)P (W ) = b− a


in the mean square sense, that is,

lim||P||→0

E(V

(2)P (W )− (b− a)

)2

= 0.

Proof. Let

AP = V(2)P (W )− (b− a) =

n−1∑j=0

[(∆Wj)

2 −∆tj],

so that

E(A2P) =

n−1∑j=0

n−1∑k=0

E[

(∆Wj)2 −∆tj

] [(∆Wk)2 −∆tk

]. (10.3)

By independent increments, the terms in the double sum for which j 6= kreduce to zero hence

E(A2P) =

n−1∑j=0

E[(∆Wj)

2 −∆tj]2

=

n−1∑j=0

E(Z2j − 1)2(∆tj)

2, (10.4)

where

Zj :=∆Wj√

∆tj∼ N(0, 1).

The quantity c := E(Z2j − 1)2 is nite and does not depend on j (see Exam-

ple 6.3.2) hence

E(A2P) ≤ c||P||

n−1∑j=0

∆tj = c||P||(b− a).

Letting ||P|| → 0 forces E(A2P)→ 0.

Remarks 10.4.4. The mean square limit lim||P||→0 V(2)P (W ) is called the

quadratic variation of Brownian motion on the interval [a, b]. That Brownianmotion has nonzero quadratic variation on any interval is in stark contrast tothe functions one encounters in Newtonian calculus. (See Exercise 2 in thisregard.)

For m ≥ 3, the mean square limit of V(m)P (W ) is zero. This follows from

the continuity of W (t) and the inequality

V(m)P (W ) =

n−1∑j=0

|∆Wj |m−2|∆Wj |2 ≤ maxj|∆Wj |m−2V

(2)P (W ).


10.5 Riemann-Stieltjes Integrals

To motivate the construction of the Ito integral, we rst give a briefoverview of the Riemann-Stieltjes integral. For details, the reader is referredto [15].

Let f and w be bounded functions on an interval [a, b]. A Riemann-Stieltjessum of f with respect to w is a sum of the form

RP =

n−1∑j=0

f(t∗j )∆wj , ∆wj := w(tj+1)− w(tj),

where P = a = t0 < t1 < · · · < tn = b is a partition of [a, b] and t∗j ∈[tj , tj+1], j = 0, 1, . . . , n− 1. The Riemann-Stieltjes integral of f with respectto w is dened as the limit∫ b

a

f(t) dw(t) = lim||P||→0

RP ,

where ||P|| = maxj(tj+1 − tj). The limit is required to be independent ofthe choice of the t∗j 's. The integral may be shown to exist if f is continuousand w has bounded rst variation on [a, b]. The Riemann integral is obtainedas a special case by taking w(t) = t. More generally, if w is continuouslydierentiable then ∫ b

a

f dw =

∫ b

a

f(t)w′(t) dt.

The Riemann-Stieltjes integral has many of the familiar properties of theRiemann integral, notably∫ b

a

(αf + βg) dw = α

∫ b

a

f dw + β

∫ b

a

g dw and∫ b

a

f dw =

∫ c

a

f dw +

∫ b

c

f dw, a < c < b.

10.6 Stochastic Integrals

For the remainder of the chapter, W denotes a Brownian motion on aprobability space (Ω,F ,P). In this section, we construct the Ito integral

I(F ) =

∫ b

a

F (t) dW (t), (10.5)


where F (t) is a stochastic process on (Ω,F ,P) with continuous paths. Givensuch a process F , for each ω ∈ Ω we may form the ordinary Riemann integral∫ b

a

F (t, ω)2 dt.

For technical reasons, we shall assume that the resulting random variable∫ baF (t)2 dt has nite expectation:

E

(∫ b

a

F (t)2 dt

)<∞.

To construct the Ito integral, consider sums of the form

n−1∑j=0

F (t∗j , ω)∆Wj(ω), ∆Wj := W (tj+1)−W (tj), (10.6)

where P := a = t0 < t1 < t2 < · · · tn = b is a partition of [a, b] and t∗j ∈[tj , tj+1]. In light of the above discussion on the Riemann-Stieltjes integral,it might seem reasonable to dene I(F )(ω) as the limit of these sums as||P|| → 0. However, this fails for several reasons.

First, the paths of W do not have bounded rst variation, so we can'texpect the sums in (10.6) to converge in the usual sense. What is needed in-stead is mean square convergence. Second, even with the appropriate mode ofconvergence, the limit of the sums in (10.6) generally depends on the choice ofthe intermediate points t∗j . To eliminate this problem, we shall always take thepoint t∗j to be the left endpoint tj of the interval [tj , tj+1]. These restrictions,however, are not sucient to ensure a useful theory. We shall also require thatthe random variable F (s) be independent of the increment W (t) −W (s) for0 ≤ s < t and depend only on the information provided by W (r) for r ≤ s.Both conditions are realized by requiring that the process F be adapted tothe Brownian ltration. Under these conditions we dene the Ito integral ofF in (10.5) to be the limit of the Ito sums

IP(F ) :=

n−1∑j=0

F (tj)∆Wj ,

where the convergence is in the mean square sense:

lim||P||→0

E|IP(F )− I(F )|2 = 0.

It may be shown that this limit exists for all continuous processes F satisfyingthe conditions described above, hereafter referred to as the usual conditions.In the discussions that follow, we shall assume, usually without explicit men-tion, that these conditions are met.


Example 10.6.1. If F (t) is deterministic, that is, does not depend on ω, andif F has bounded variation, then the following integration by parts formula isvalid: ∫ b

a

F (t) dW (t) = F (b)W (b)− F (a)W (a)−∫ b

a

W (t) dF (t). (10.7)

Here the integral on the right, evaluated at any ω, is interpreted as a Riemann-Stieltjes integral.

To verify (10.7), let P be a partition of [a, b] and write

n−1∑j=0

F (tj)∆Wj(t) =

n∑j=1

F (tj−1)W (tj)−n−1∑j=0

F (tj)W (tj)

= F (tn−1)W (b)− F (a)W (a)−n−1∑j=1

W (tj)∆Fj−1. (10.8)

As ‖P‖ → 0, the sum in (10.8) converges to the Riemann-Stieltjes integral(both pointwise in ω and in the mean square sense) and F (tn−1) converges toF (b).

If F ′ is continuous, then (10.7) takes the form∫ b

a

F (t) dW (t) = F (b)W (b)− F (a)W (a)−∫ b

a

W (t)F ′(t) dt, (10.9)

as may be seen by applying the Mean Value Theorem to the increments ∆Fj .

Because F is deterministic, the random variable∫ baF (t) dW (t) is normal with

mean zero and variance∫ baF 2(t) dt (Corollary 10.6.4, below). It follows from

(10.9) that the random variable∫ b

a

W (t)F ′(t) dt+ F (a)W (a)− F (b)W (b)

is also normal with mean zero and variance∫ baF 2(t) dt. In particular, taking

F (t) = t− b, we see that ∫ b

a

[W (t)−W (a)] dt

is normal with mean 0 and variance∫ ba

(t− b)2 dt = (b− a)3/3.

Example 10.6.2. Theorem 10.4.3 may be used to derive the formula∫ b

a

W (t) dW (t) =W 2(b)−W 2(a)

2− b− a

2. (10.10)


To verify this, note rst that, for any sequence of real numbers xj ,

2

n−1∑j=0

xj(xj+1 − xj) = x2n − x2

0 −n−1∑j=0

(xj+1 − xj)2,

as may be seen by direct expansion. In particular, setting xj = W (tj , ω),where P = a = t0 < t1 < · · · < tn = b is an arbitrary partition of [a, b], wehave

2

n−1∑j=0

W (tj)∆Wj = W 2(b)−W 2(a)− V (2)P (W ).

Equation (10.10) is now an immediate consequence of Theorem 10.4.3.

Remarks. The term 12 (b−a) in (10.10) arises because of the particular choice

of t∗j in the denition of (10.5) as the left endpoint of the interval [tj , tj+1]. If wehad instead used midpoints (thus producing what is called the Stratonovichintegral), then the term 1

2 (b − a) would not appear and the result wouldconform to the familiar one of classical calculus. The choice of left endpoints isdictated by technical considerations, including the fact that this choice makesthe Ito integral a martingale, a result of fundamental importance in both thetheory and applications of stochastic calculus. One can also explain the choiceof the left endpoint heuristically: Consider the parameter t in F (t) and W (t)to represent time. If tj represents the present, then we should use the knownvalue F (tj) in the jth term of the approximation (10.6) of the integral ratherthan a value F (t∗j ), t

∗j > tj , which may be viewed as anticipating the future.

The crucial step in Example 10.6.2 is the result proved in Theorem 10.4.3that the sums

∑n−1j=0 (∆Wj)

2 converge in the mean square sense to b− a. Thisfact, which is sometimes written symbolically as

(dW )2 = dt,

is largely responsible for the dierence between the Ito calculus and Newtoniancalculus.

The following theorem summarizes the main properties of the Ito integral.The processes F and G are assumed to satisfy the usual conditions describedin the preceding section.

Theorem 10.6.3. Let α, β ∈ R and 0 ≤ a < c < b. Then

(i)∫ ba

[αF (t) + βG(t)] dW (t) = α∫ baF (t) dW (t) + β

∫ baG(t) dW (t);

(ii)∫ baF (t) dW (t) =

∫ caF (t) dW (t) +

∫ bcF (t) dW (t);

(iii) E(∫ b

aF (t) dW (t)

)= 0;

(iv) E(∫ b

aF (t) dW (t)

)2

=∫ baE(F 2(t)

)dt;


(v) E(∫ b

aF (t) dW (t)

∫ baG(t) dW (t)

)=∫ baE (F (t)G(t)) dt; and

(vi) E(∫ b

aF (t) dt

)=∫ baE (F (t)) dt.

Proof. For part (i), observe rst that, for any real numbers x and y,

(x+ y)2 = 2(x2 + y2)− (x− y)2 ≤ 2(x2 + y2).

Now setH = αF+βG andX = αI(F )+βI(G). Applying the above inequalitytwice and using the fact that IP(H) = αIP(F ) + βIP(G), we have

[I(H)−X]2

= |I(H)− IP(H) + IP(H)−X|2

≤ 2 |I(H)− IP(H)|2 + 2 |IP(H)−X|2

≤ 2 |I(H)− IP(H)|2 + 4α2 |IP(F )− I(F )|2

+ 4β2 |IP(G)− I(G)|2 .

Letting ||P|| → 0 veries part (i).For (ii), note that a partition P of [a, b] containing the intermediate point

c is the union of partitions P1 of [a, c] and P2 of [c, b] hence IP(F ) = IP1(F )+

IP2(F ). For partitions that do not contain c, a relation of this sort holds

approximately, the approximation improving as ||P|| → 0, so that in the limitone obtains (ii).

Part (iii) follows from

E IP(F ) =

n−1∑j=0

E(F (tj)∆Wj) =

n−1∑j=0

(EF (tj)

)(E∆Wj

)= 0,

where we have used the independence of F (tj) and ∆Wj .To prove part (iv), note that the terms in the double sum

E [IP(F )]2

=

n−1∑j=0

n−1∑k=0

E [F (tj)∆WjF (tk)∆Wk]

for which j 6= k evaluate to zero. Indeed, if j < k, then F (tj)∆WjF (tk) isFWtk -measurable, and since ∆Wk is independent of FWtk 2 it follows that

E [F (tj)∆WjF (tk)∆Wk] = E [F (tj)∆WjF (tk)]E (∆Wk) = 0.

Also, because F (tj) and ∆Wj are independent,

E (F (tj)∆Wj)2

= E(F 2(tj)

)E (∆Wj)

2= E

(F 2(tj)

)(tj+1 − tj).

2This assertion requires a conditioning argument based on results from Section 12.1. Fora discrete version of the calculation, see Exercise 8.2.


Therefore,

E [IP(F )]2

=

n−1∑j=0

E(F 2(tj)

)(∆tj),

which is a Riemann sum for the integral∫ baE(F 2(t)

)dt. Letting ||P|| → 0

yields (iv).For (v), dene

[F,G

]= E

(∫ b

a

F (t) dW (t)

∫ b

a

G(t) dW (t)

)and ⟨

F,G⟩

=

∫ b

a

E (F (t)G(t)) dt.

The bracket functions are linear in each argument separately, and by (iv) yieldthe same value when F = G. Since

4[F,G

]=[F +G,F +G

]−[F −G,F −G

],

with a similar equality holding for⟨F,G

⟩, we see that

[F,G

]=⟨F,G

⟩.

Part (vi) is a consequence of Fubini's Theorem, which gives general con-ditions under which integral and expectation may be interchanged. A proofmay be found in standard texts on real analysis.

Corollary 10.6.4. If F (t) is a deterministic process, then the Ito integral∫ baF (t) dW (t) is normal with mean zero and variance

∫ baF 2(t) dt.

Proof. That I(F ) has mean 0 and variance∫ baF 2(t) dt follows from (iii) and

(iv) of the theorem. To see that I(F ) is normal, note that IP(F ), as sumof independent normal random variables F (tj)∆Wj , is itself normal (Exam-ple 3.6.2). A standard result in probability theory implies that I(F ), as a meansquare limit of normal random variables, is also normal.

10.7 The Ito-Doeblin Formula

Denition 10.7.1. An Ito process is a stochastic process X of the form

Xt = Xa +

∫ t

a

F (s) dW (s) +

∫ t

a

G(s) ds, a ≤ t ≤ b, (10.11)

where F and G are continuous processes adapted to (FWt ) and

E

(∫ b

a

F 2(t) dt

)+ E

(∫ b

a

|G(t)| dt)< +∞.


Equation (10.11) is usually written in dierential notation as

dX = F dW +Gdt.

For example, if we take b = t in Equation (10.9) and rewrite the resultingequation as

F (t)W (t) = F (a)W (a) +

∫ t

a

F (s) dW (s) +

∫ t

a

W (s)F ′(s) ds,

then FW is seen to be an Ito process with dierential

d(FW ) = F dW +WF ′ dt.

Similarly, we can rewrite Equation (10.10) as

W 2t = W 2

a + 2

∫ t

a

W (s) dW (s) +

∫ t

a

1 ds,

which shows that W 2 is an Ito process with dierential

dW 2 = 2WdW + dt.

Note that if X is a deterministic function with continuous derivative thenXt = Xa +

∫ taX ′(s) ds. Thus, by the above convention, dX = X ′(t)dt, in

agreement with the classical denition of dierential.The Ito-Doeblin formula, described in various forms below, is useful in

generating stochastic dierential equations, the subject of the next section.The following theorem gives the simplest version of the formula.

Theorem 10.7.2 (Ito-Doeblin Formula, Version 1). Let f(x) have continuousrst and second derivatives. Then the process f(W ) has dierential

df(W ) = f ′(W ) dW +1

2f ′′(W ) dt.

In integral form,

f (W (t)) = f (W (a)) +

∫ t

a

f ′ (W (s)) dW (s) +1

2

∫ t

a

f ′′ (W (s)) ds.

Proof. We give a plausible argument under the assumption that f has Taylorseries expansions

f(r)− f(s) =

∞∑n=1

f (n)(s)

n!(r − s)n, r, s ∈ [a, b].

Detailed proofs may be found, for example, in [9, 18].


Let P = a = t0 < t1 < · · · < tn = b be a partition of [a, b]. Then

f(W (t))− f(W (a)) =

n−1∑j=0

f(W (tj+1))− f(W (tj)),

and for each j

f(W (tj+1))− f(W (tj)) =

∞∑n=1

f (n)(W (tj))

n!(∆Wj)

n

= f ′(W (tj))∆Wj + 12f′′(W (tj))(∆Wj)

2 + (∆Wj)3Rj

for a suitable remainder term Rj . Thus,

f(W (t))− f(W (a)) = AP +BP + CP ,

where

AP =

n−1∑j=0

f ′(W (tj))∆Wj ,

BP =1

2

n−1∑j=0

f ′′(W (tj))(∆Wj)2, and

CP =

n−1∑j=0

(∆Wj)3Rj .

Now consider the mean square limits of AP , BP , and CP as ||P|| → 0.

Clearly, AP →∫ taf ′ (W (s)) dW (s). Recalling that

∑n−1j=0 (∆Wj)

2 → b − a =∫ ba

1 dt, it is not unreasonable to expect that BP →∫ baf ′′(W (t)) dt. This is

indeed the case and may be proved by methods similar to those used in theproof of Theorem 10.4.3. Finally, using the fact that the order m variation ofBrownian motion is zero for m ≥ 3 (Remarks 10.4.4), one shows that CP → 0,completing the argument.

Remark 10.7.3. The integral equation in Theorem 10.7.2 may be expressedas ∫ t

a

f ′ (W (s)) dW (s) = f (W (t))− f (W (a))− 1

2

∫ t

a

f ′′ (W (s)) ds,

which is Ito's version of the fundamental theorem of calculus. The presence ofthe integral on the right is a consequence of the nonzero quadratic variationof W .


Example 10.7.4. Applying Theorem 10.7.2 to the function f(x) = xk, k ≥ 2,we have

dW k = kW k−1 dW +1

2k(k − 1)W k−2 dt,

which has integral form

W k(t) = W k(a) + k

∫ t

a

W k−1(s) dW (s) + 12k(k − 1)

∫ t

a

W k−2(s) ds.

Rearranging we have∫ t

a

W k−1(s) dW (s) =W k(t)−W k(a)

k− (k − 1)

2

∫ t

a

W k−2(s) ds,

which may be viewed as an evaluation of the Ito integral on the left. Thespecial case k = 2 is the content of Example 10.6.2.

We state without proof three additional versions of the Ito-Doeblin For-mula, each of which considers dierentials of functions of several variables. Aproof of the rst may be given along the lines of that of Theorem 10.7.2, usingmultivariable Taylor series.

Theorem 10.7.5 (Ito-Doeblin Formula, Version 2). Suppose f(t, x) is con-tinuous with continuous partial derivatives ft, fx, and fxx. Then, suppressingthe variable t in the notation W (t),

df (t,W ) = fx (t,W ) dW + ft (t,W ) dt+ 12fxx (t,W ) dt.

In integral form,

f (t,W (t)) = f (a,W (a)) +

∫ t

a

fx (s,W (s)) dW (s)

+

∫ t

a

[ft (s,W (s)) + 1

2fxx (s,W (s))]ds.

Versions 1 and 2 of the Ito-Doeblin Formula deal only with functions ofthe process W . The following version treats functions of a general Ito process.

Theorem 10.7.6 (Ito-Doeblin Formula, Version 3). Suppose f(t, x) is con-tinuous with continuous partial derivatives ft, fx, and fxx. Let X be an Itoprocess with dierential

dX = F dW +Gdt.

Then

df (t,X) = ft (t,X) dt+ fx (t,X) dX + 12fxx (t,X) (dX)

2

= fx (t,X)F dW +[ft (t,X) + fx (t,X)G+ 1

2fxx (t,X)F 2]dt.


· dt dWdt 0 0dW 0 dt

TABLE 10.1: Symbol Table One

Remark 10.7.7. The second equality in the formula may be obtained fromthe rst by substituting F dW +Gdt for dX and using the formal multiplica-tion rules summarized in the above symbol table. The rules reect the limitproperties

n−1∑j=0

(∆Wj)2 → b− a,

n−1∑j=0

(∆Wj)∆tj → 0, and

n−1∑j=0

(∆tj)2 → 0

as ||P|| → 0. Using the table, we have

(dX)2 = (F dW +Gdt)2 = F 2 dt.

Example 10.7.8. Let h(t) be a dierentiable function and X an Ito process.We calculate d(hX) by applying the above formula to f(t, x) = h(t)x. Sinceft(t, x) = h′(t)x, fx(t, x) = h(t) and fxx(t, x) = 0, we have

d(hX) = h dX + h′X dt = h dX +X dh,

which conforms to the product rule in classical calculus.

The general version of the Ito-Doeblin Formula allows functions of nitelymany Ito processes. We state the formula for the case n = 2.

Theorem 10.7.9 (Ito-Doeblin Formula, Version 4). Suppose f(t, x, y) is con-tinuous with continuous partial derivatives ft, fx, fy, fxx, fxy, and fyy. LetXj be an Ito process with dierential

dXj = Fj dWj +Gj dt, j = 1, 2,

where W1 and W2 are Brownian motions. Then

df(t,X1, X2) = ft (t,X1, X2) dt+ fx (t,X1, X2) dX1 + fy (t,X1, X2) dX2

+ 12fxx (t,X1, X2) (dX1)2 + 1

2fyy (t,X1, X2) (dX2)2

+ fxy (t,X1, X2) dX1 · dX2.

Remark 10.7.10. The dierential df (t,X1, X2) may be described in termsof dt, dW1, and dW2 by substituting Fj dWj + Gj dt for dXj and using theformal multiplication rules given in Table 10.2. From the table, we see that

(dX1)2 = F 21 dt, (dX2)2 = F 2

2 dt, and dX1 · dX2 = F1F2 dW1 · dW2,


· dt dW1 dW2

dt 0 0 0dW1 0 dt dW1 · dW2

dW2 0 dW1 · dW2 dt

TABLE 10.2: Symbol Table Two

hence

df (t,X1(t), X2(t)) = ft dt+ fx · (F1 dW1 +G1 dt) + fy · (F2 dW2 +G2 dt)

+ 12fxxF

21 dt+ 1

2fyyF22 dt+ fxyF1F2 dW1 · dW2

= fxF1 dW1 + fyF2 dW2 + fxyF1F2 dW1 · dW2

+[ft + fxG1 + fyG2 + 1

2fxxF21 + 1

2fyyF22

]dt,

where the partial derivatives of f are evaluated at (t,X1(t), X2(t)). The eval-uation of the term dW1 · dW2 depends on how dW1 and dW2 are related. Forexample, if W1 and W2 are independent, then dW1 · dW2 = 0. On the otherhand, if W1 and W2 are correlated, that is,

W1 = %W2 +√

1− %2W3,

where W2 and W3 are independent Brownian motions and 0 < |%| ≤ 1, then

dW1 · dW2 = % dt.

(See, for example, [17].)

Example 10.7.11. We use Theorem 10.7.9 to obtain Ito's product rule forthe dierentials dXj = Fj dWj +Gj dt, j = 1, 2. Taking f(x, y) = xy we havefx = y, fy = x, fxy = 1, and ft = fxx = fyy = 0 hence

d(X1X2) =

X2 dX1 +X1 dX2 if W1 and W2 are independent

X2 dX1 +X1 dX2 + %F1F2 dt if W1 and W2 are correlated.

Thus, in the independent case, we obtain the familiar product rule of classicalcalculus.

10.8 Stochastic Dierential Equations

Denition 10.8.1. A stochastic dierential equation (SDE) is an equationof the form

dX(t) = α(t,X(t)) dW (t) + β(t,X(t)) dt,


where α(t, x) and β(t, x) are continuous functions. A solution of the SDE isa stochastic process X adapted to (FWt ) and satisfying

X(t) = X(0) +

∫ t

0

α(t,X(t)) dW (t) +

∫ t

0

β(t,X(t)) dt, (10.12)

where X(0) is a specied random variable.

In certain cases the Ito-Doeblin formula, which generates SDEs from Itoprocesses, may be used to nd an explicit form of the solution (10.12). Weillustrate with two general procedures, each based on an Ito process

Y (t) = Y (0) +

∫ t

0

F (s) dW (s) +

∫ t

0

G(s) ds. (10.13)

The rst procedure applies Version 3 of the formula to the process X(t) =eY (t). With f(t, y) = ey we have

dX = df(t, Y ) = ft(t, Y ) dt+ fy(t, Y ) dY + 12fyy(t, Y ) (dY )2

= X dY + 12X (dY )2,

and since dY = F dW +Gdt and (dY )2 = F 2 dt we obtain

dX = FX dW +(G+ 1

2F2)X dt. (10.14)

Equation 10.14 therefore provides a class of SDEs with solutions

X(t) = X(0) exp

∫ t

0

F (s) dW (s) +

∫ t

0

G(s) ds

. (10.15)

Example 10.8.2. Let σ and µ be continuous stochastic processes. TakingF = σ and G = µ− 1

2σ2 in (10.14), we obtain the SDE

dX = σX dW + µX dt,

which, by (10.15), has solution

X(t) = X(0) exp

∫ t

0

σ(s) dW (s) +

∫ t

0

[µ(s)− 1

2σ2(s)

]ds

.

In case µ and σ are constant, the solution reduces to

X(t) = X(0) exp[σW (t) + (µ− 1

2σ2)t],

a process known as geometric Brownian motion. This example will form thebasis of discussion in the next chapter.


The second procedure applies Version 3 of the Ito-Doeblin formula to theprocess X(t) = h(t)Y (t), where h(t) is a nonzero dierentiable function andY is given by (10.13). By Example 10.7.8,

dX = h dY + h′Y dt = h(F dW +Gdt) + h′Y dt.

Rearranging, we obtain the SDE

dX = hF dW +

(hG +

h′

hX

)dt.

The solution X = hY may be written

X(t) = h(t)

(X(0)

h(0)+

∫ t

0

F (s) dW (s) +

∫ t

0

G(s) ds

).

Taking F = f/h and G = g/h, where f and g are continuous functions, weobtain the SDE

dX = f dW +

(g +

h′

hX

)dt (10.16)

with solution

X(t) = h(t)

(X(0)

h(0)+

∫ t

0

f(s)

h(s)dW (s) +

∫ t

0

g(s)

h(s)ds

). (10.17)

Example 10.8.3. Let α, β, and σ be constants with β > 0 and take f = σ,g = α, and h(t) = exp(−βt) in (10.16). Then

dX = σ dW + (α− βX) dt, (10.18)

which, by (10.17), has solution

X(t) = e−βt(X(0) +

α

β(eβt − 1) + σ

∫ t

0

eβs dW (s)

), (10.19)

called an Ornstein-Uhlenbeck process. In nance, the SDE in (10.18) is knownas the Vasicek equation and is used to describe the evolution of stochastic in-terest rates (see, for example, [17]). With α = 0 and σ > 0, Equation (10.18)is called a Langevin equation, which plays a central role in statistical mechan-ics.


10.9 Exercises

1. Find the solution of each of the following ODEs and the largest openinterval on which it is dened.

(a) x′ = x2 sin t, x(0) = 1/3;

(b) x′ = x2 sin t, x(0) = 2;

(c) x′ =2t+ cos t

2x, x(0) = 1;

(d) x′ =x+ 1

tan t, x(π/6) = 1/2, 0 < t < π/2.

2. The variation of order m of a (deterministic) function f on an interval[a, b] is dened as the limit

lim||P||→0

V(m)P (f).

Prove the following:

(a) If f is a bounded function with zero variation of order m on [a, b],then f has zero variation of order m+ 1 on [a, b].

(b) If f is continuous with bounded mth variation on [a, b], then f haszero variation of order k > m on [a, b].

(c) If f has a bounded rst derivative on [a, b], then it has bounded rstvariation and zero variation of order m ≥ 2 on [a, b].

(d) The function f(t) = t1+ε sin (1/t), f(0) = 0, has bounded rst vari-ation and zero variation of order m ≥ 2 on [0, 1].

3. Show that the Riemann-Stieltjes integral∫ 2/π

01 dw does not exist for

the function w dened in Example 10.4.2.

4. Show that for any nonzero constant c, W1(t) := cW (t/c2) denes aBrownian motion.

5. Show that for r ≤ s ≤ t,

W (s)+W (t) ∼ N(0, t+3s) and W (r)+W (s)+W (t) ∼ N(0, t+3s+5r).

Generalize.

6. Show that for a > 1/2,

lims→0

saW (1/s) = 0,

where the limit is taken in the mean square sense.


7. Use Theorem 10.6.3 to nd VXt if Xt =

(a)∫ t

0

√sW (s) dW (s);

(b)∫ t

0exp

(W 2(s)

)dW (s);

(c)∫ t

0

√|W (s)| dW (s).

8. Show that ∫ b

a

[W (t)−W (b)] dt

is normal with mean 0 and variance (b− a)3/3. (See Example 10.6.1.)

9. Use the Ito-Doeblin formulas to show that

(a)

∫ t

a

eW (s) dW (s) = eW (t) − eW (a) − 1

2

∫ t

a

eW (s) ds;

(b) 2

∫ t

0

sW (s) dW (s) = tW 2(t)− t2

2−∫ t

0

W 2(s) ds; and

(c) d

(X

Y

)=

X

Y

[dX

X− dY

Y+

(dY

Y

)2

− dX

X

dY

Y

], where X and Y

are Ito processes.

10. Let X be an Ito process with dX = FdW +Gdt. Find the dierentials,in terms of dW and dt, of (a) X2; (b) lnX; (c) tX2.

11. Show that, for the process given in Equation (10.19), limt→∞ EXt = αβ .

Chapter 11

The Black-Scholes-Merton Model

With the methods of Chapter 10 at our disposal, we are now able to derive thecelebrated Black-Scholes formula for the price of a call option. The formula isbased on the solution of a partial dierential equation arising from an SDEthat governs the price of the underlying stock S. We assume throughout thatthere are no arbitrage opportunities in the market.

11.1 The Stock Price SDE

Let W be a Brownian motion on a probability space (Ω,F ,P). The priceof a single share of S is assumed to satisfy the SDE

dS

S= σ dW + µdt, (11.1)

where µ and σ are constants called, respectively, the drift and volatility ofthe stock. Equation (11.1) asserts that the relative change in the stock pricehas two components: a deterministic part µdt, which accounts for the generaltrend of the stock, and a random component σ dW , which reects the unpre-dictable nature of S. The volatility is a measure of the riskiness of the stockand its sensitivity to changes in the market. If σ = 0, then (11.1) is an ODEwith solution St = S0e

µt.Equation (11.1) may be written in standard form as

dS = σS dW + µS dt, (11.2)

which is the SDE of Example 10.8.2. The solution there was found to be

St = S0 exp[σWt +

(µ− 1

2σ2)t]. (11.3)

The integral version of (11.2) is

St = S0 +

∫ t

0

σS(s) dW (s) +

∫ t

0

µS(s) ds. (11.4)

Taking expectations in (11.4) and using Theorem 10.6.3, we that

ESt = S0 + µ

∫ t

0

ES(s) ds.

141


The function x(t) := ESt therefore satises the ODE x′ = µx, hence ESt =S0e

µt. This is the solution of (11.1) for the case σ = 0 and represents the returnon a risk-free investment. Thus, taking expectations in (11.4) removes therandom component of (11.1).

Although we won't consider such a general setting, both the drift µ andthe volatility σ may be stochastic processes. In this case, the solution to (11.1)is given by

St = S0 exp

∫ t

0

σ(s) dW (s) +

∫ t

0

[µ(s)− 1

2σ2(s)

]ds

,

as was shown in Example 10.8.2.

11.2 Continuous-Time Portfolios

As in the binomial model, the basic construct in determining the value of aclaim is a self-nancing, replicating portfolio based on S and a risk-free bondB, the bond account earning interest at a constant annual rate r. The valueof the bond at time t is denoted by Bt, where we take the initial value B0 tobe one unit. Thus, Bt = ert, which is the solution of the ODE

dB = rB dt, B0 = 1.

We assume that the market allows unlimited trading in shares of S and unitsof B.

The following is the continuous-time analog of Denition 5.2.1.

Denition 11.2.1. A portfolio or trading strategy for (B,S) is a two-dimensional stochastic process (φ, θ) = (φt, θt)0≤t≤T adapted to the price ltra-tion (FSt )0≤t≤T . The random variables φt and θt are, respectively, the numberof units of B and shares of S held at time t. The value of the portfolio at timet is dened as

Vt = φtBt + θtSt, 0 ≤ t ≤ T.The process V = (Vt)0≤t≤T is the value or wealth process of the tradingstrategy, and V0 is the initial investment or initial wealth.

Denition 11.2.2. A portfolio (φ, θ) is self-nancing if

dV = φdB + θ dS. (11.5)

To understand the implication of 11.5, consider a discrete version of theportfolio process dened at times t0 = 0 < t1 < t2 < · · · < tn = T . At timetj , the value of the portfolio before the price Sj is known is

φjBj−1 + θjSj−1,

The Black-Scholes-Merton Model 143

where, for ease of notation, we have written Sj for Stj , and so forth. After Sjbecomes known and the new bond value Bj is noted, the portfolio has value

Vj = φjBj + θjSj .

At this time, the number of stocks and bonds may be adjusted (based on theinformation provided by Fj), but for the portfolio to be self-nancing, thisrestructuring must be accomplished without changing the current value of theportfolio. Thus, the new values φj+1 and θj+1 must satisfy

φj+1Bj + θj+1Sj = φjBj + θjSj .

It follows that

∆Vj = φj+1Bj+1 + θj+1Sj+1 − (φj+1Bj + θj+1Sj)

= φj+1∆Bj + θj+1∆Sj ,

which is the discrete version of (11.5), in agreement with Theorem 5.2.5.As in the discrete case, a portfolio may be used as a hedging strategy, that

is, an investment in shares of S and units of B devised to cover the obligationof the writer at maturity T . In this case, the portfolio is said to replicatethe claim, the latter formally dened as a FST -random variable. A market iscomplete if every claim can be replicated.

As with discrete time portfolios, the importance of continuous time port-folios derives from the law of one price, which implies that in an arbitrage-freemarket the value of a claim is that of a replicating, self-nancing trading strat-egy. We use this observation in the next section to obtain a formula for thevalue of a claim with underlying S.

11.3 The Black-Scholes-Merton PDE

To derive the Black-Scholes formula, we begin by assuming the existenceof a self-nancing portfolio whose value VT at time T is the payo f(ST )of a European claim, where f is a continuous function with suitable growthconditions. For such a portfolio, the value of the claim at any time t ∈ [0, T ]is Vt. We seek a function v(t, s) such that

Vt = v(t, St), 0 ≤ t ≤ T, and v(T, ST ) = f(ST ).

Note that if S0 = 0 then (11.3) implies that the process S is identically zero.In this case, the claim is worthless and Vt = 0 for all t. Therefore, v mustsatisfy the boundary conditions

v(T, s) = f(s), s ≥ 0, and v(t, 0) = 0, 0 ≤ t ≤ T. (11.6)


To determine v, we begin by applying Version 3 of the Ito-Doeblin formula(Theorem 10.7.6) to the process Vt = v(t, St). Using (11.2), we have

dV = vt dt+ vs dS + 12vss (dS)2

= σvsS dW +(vt + µvsS + 1

2σ2vssS

2)dt, (11.7)

where the partial derivatives of v are evaluated at (t, St). Additionally, from(11.5), we have

dV = θ dS + φdB

= θS(µdt+ σ dW ) + rφB dt

= σθS dW + [µθS + r(V − θ)S] dt. (11.8)

Equating the respective coecients of dt and dW in (11.7) and (11.8) leadsto the equations

µθS + r(V − θS) = vt + µvsS + 12σ

2vssS2 and θ = vs.

Substituting the second equation into the rst and simplifying yields the par-tial dierential equation

vt + rsvs + 12σ

2s2vss − rv = 0, s > 0, 0 ≤ t < T. (11.9)

Equation (11.9) together with the boundary conditions in (11.6) is called theBlack-Scholes-Merton (BSM) PDE.

The following theorem gives the solution v(t, s) to (11.9). The assertionof the theorem may be veried directly, but it is instructive to see how thesolution may be obtained from that of a simpler PDE. The latter approach iscarried out in Appendix B.

Theorem 11.3.1 (General Solution of the BSM PDE). The solution of (11.9)with the boundary conditions (11.6) is given by

v(t, s) = e−r(T−t)G(t, s), 0 ≤ t < T, where

G(t, s) :=

∫ ∞−∞

f(s exp

σ√T − t y + (r − 1

2σ2)(T − t)

)ϕ(y) dy.

Having obtained the solution v of the BSM PDE, to complete the circle ofideas we must show that v( · , S) is indeed the value process of a self-nancing,replicating trading strategy. This is carried out in the following theorem, whoseproof uses v( · , S) to construct the strategy. A martingale proof is given inChapter 13.

Theorem 11.3.2. Given a European claim with payo f(ST ), there exists aself-nancing replicating strategy for the claim with value process

Vt = v(t, St), 0 ≤ t ≤ T, (11.10)

where v(t, s) is the solution of the BSM PDE.


Proof. Dene V by (11.10) and dene adapted processes θ and φ by

θ(t) = vs(t, St) and φ = B−1(V − θS).

Then V is the value process of the strategy (φ, θ), and from (11.7) and (11.9)

dV = σθS dW + [r(V − θS) + µθS] dt

= θS(µdt+ σ dW ) + r(V − θS) dt

= θ dS + φdB.

Therefore, (φ, θ) is self-nancing. Since v(T, s) = f(s), the strategy replicatesthe claim.

From Theorem 11.3.2, we obtain the celebrated Black-Scholes option pric-ing formula:

Corollary 11.3.3. The value at time t ∈ [0, T ) of a standard call option withstrike price K and maturity T is given by

Ct = StΦ(d1(T − t, St,K, σ, r)

)−Ke−r(T−t)Φ

(d2(T − t, St,K, σ, r)

), (11.11)

where the functions d1 and d2 are dened by

d1(τ, s,K, σ, r) =ln (s/K) + (r + 1

2σ2)τ

σ√τ

and

d2(τ, s,K, σ, r) =ln (s/K) + (r − 1

2σ2)τ

σ√τ

= d1 − σ√τ .

In particular, the cost of the option is

C0 = S0Φ(d1(T, S0,K, σ, r)

)−Ke−rTΦ

(d2(T, S0,K, σ, r)

). (11.12)

Proof. Taking f(x) = (x − K)+ in Theorem 11.3.1 and applying Theo-rem 11.3.2, we see that the value of the call option at time t ∈ [0, T ) is

Ct = e−r(T−t)G(t, St),

where

G(t, s) =

∫ ∞−∞

(s exp

σ√τ y + (r − 1

2σ2)τ−K

)+ϕ(y) dy, τ := T − t.

To evaluate the integral, note that the integrand is increasing in y and equalszero when y < −d2, where dj := dj(τ, s,K, σ, r). Thus,

G(t, s) = s

∫ ∞−d2

expσ√τ y +

(r − 1

2σ2)τϕ(y) dy −K

∫ ∞−d2

ϕ(y) dy

=se(r−σ2/2)τ

√2π

∫ ∞−d2

exp− 1

2y2 + σ

√τ ydy −K [1− Φ(−d2)]

= serτΦ(d1)−KΦ(d2), (11.13)

the last equality by Exercise 12.


Example 11.3.4. Table 11.1 gives prices C0 and P0 for options based on astock with price S0 = $20.00. C0 is calculated using (11.12) and P0 is obtainedfrom the put-call parity formula. The table suggests that C0 is increasing in

T K r σ C0 P0 T K r σ C0 P0

.5 18 .06 .1 $2.55 $0.01 2 18 .06 .1 $4.09 $0.06

.5 18 .06 .2 $2.77 $0.24 2 18 .06 .2 $4.64 $0.61

.5 18 .12 .1 $3.05 $0.00 2 18 .12 .1 $5.85 $0.01

.5 18 .12 .2 $3.20 $0.16 2 18 .12 .2 $6.09 $0.25

.5 22 .06 .1 $0.14 $1.49 2 22 .06 .1 $1.37 $0.89

.5 22 .06 .2 $0.61 $1.96 2 22 .06 .2 $2.47 $1.99

.5 22 .12 .1 $0.28 $1.00 2 22 .12 .1 $2.90 $0.21

.5 22 .12 .2 $0.82 $1.54 2 22 .12 .2 $3.71 $1.02

TABLE 11.1: Variation of C0 and P0 with T , K, r and σ

the variables σ, r, and T and decreasing in K. These and other relations willbe examined in the next section.

11.4 Properties of the BSM Call Function

The Black-Scholes-Merton (BSM) call function is dened by

C = C(τ, s,K, σ, r) = sΦ(d1)−Ke−rτΦ(d2), τ, s,K, σ, r > 0,

where

d1,2 = d1,2(τ, s,K, σ, r) =ln (s/K) + (r ± σ2/2)τ

σ√τ

.

For the sake of brevity, we shall occasionally suppress one or more argumentsin the functions C and d1,2. By Corollary 11.3.3, C(T, S0,K, σ, r) is the priceC0 of a call option with strike price K, maturity T , and underlying stock priceS0. In the notation of (11.11), C(T − t, St,K, σ, r) = Ct, the value of the callat time t.

The analogous BSM put function is dened as

P = P (τ, s,K, σ, r) = C(τ, s,K, σ, r)− s+Ke−rτ , τ, s,K, σ, r > 0. (11.14)

By the put-call parity relation, P (T, S0,K, σ, r) is the price of the correspond-ing put option and Pt := P (T − t, St,K, σ, r) is its value at time t.


We state below two theorems that summarize the analytical properties ofthe BSM call function. The rst expresses various measures of sensitivity ofan option price to market parameters in terms of the standard normal cdf anddensity functions. The second describes the limiting behaviors of the price withrespect to these parameters. The proofs are given in Appendix C. Analogousproperties of the BSM put function may be derived from these theorems using(11.14).

Theorem 11.4.1 (Growth Rates of C).

(i)∂C

∂s= Φ(d1) (iv)

∂C

∂σ= s√τ ϕ(d1)

(ii)∂2C

∂s2=

1

sσ√τϕ(d1) (v)

∂C

∂r= Kτe−rτΦ(d2)

(iii)∂C

∂τ=

σs

2√τϕ(d1) +Kre−rτΦ(d2) (vi)

∂C

∂K= −e−rτΦ(d2)

Remarks. (a) The quantities

∂C

∂s,∂2C

∂s2, −∂C

∂τ,∂C

∂σ, and

∂C

∂r

are called, respectively, delta, gamma, theta, vega, and rho, and are knowncollectively as the Greeks. A detailed analysis with concrete examples may befound in [7].

(b) Theorem 11.4.1 shows that C is increasing in each of the variables s,τ , σ, and r, and decreasing in K. These analytical facts have simple nancialexplanations. For example, an increase in S0 and/or decrease in K will likelyincrease the payo (ST −K)+ and therefore should require a higher call price.An increase in T or r decreases the discounted strike price Ke−rT , reducingthe initial cash needed to cover the strike price at maturity, making the optionmore attractive.

(c) Since v(t, s) = C(T − t, s,K, σ, r), property (i) implies that

vs(t, s) = Φ (d1(T − t, s,K, σ, r)) .

Recalling that vs(t, St) = θt represents the stock holdings in the replicatingportfolio at time t, we see that the expression

StΦ(d1(T − t, St,K, σ, r)

)in the Black-Scholes formula gives the time-t value of the stock holdings, andthe dierence

v(t, St)− StΦ(d1(T − t, St,K, σ, r)

)= −Ke−r(T−t)Φ

(d2(T − t, St,K, σ, r)

)the time-t value of the bond holdings. In other words, the portfolio shouldalways be long Φ(d1) shares of the stock and short the cash amountKe−r(T−t)Φ(d2).


Theorem 11.4.2 (Limiting Behavior of C).

(i) lims→∞

(C − s) = −Ke−rτ (vi) limK→0+

C = s

(ii) lims→0+

C = 0 (vii) limσ→∞

C = s

(iii) limτ→∞

C = s (viii) limσ→0+

C = (s− e−rτK)+

(iv) limτ→0+

C = (s−K)+ (ix) limr→∞

C = s

(v) limK→∞

C = 0

Remarks. As with Theorem 11.4.1, the analytical assertions of Theo-rem 11.4.2 have simple nancial interpretations. For example, part (i) impliesthat for large S0, C0 ≈ S0 −Ke−rT , which is the initial value of a portfoliowith payo ST −K. This is to be expected, as a larger S0 makes it more likelythat the option will be exercised, resulting in precisely this payo.

Part (iii) asserts that for large T the cost of the option is roughly the sameas the initial value of the stock. This can be understood by noting that if Tis large the discounted strike price Ke−rT is negligible. If the option nishesin the money, a portfolio consisting of cash in the (small) amount Ke−rT anda long call will have the same maturity value as a portfolio that is long inthe stock. The law of one price then dictates that the two portfolios have thesame start-up cost, which is roughly that of the call. A similar explanationmay be given for (ix).

Part (iv) implies that for S0 > K and small T the price of the option isthe dierence between the initial value of the stock and the strike price. Thisis to be expected, as the holder would likely receive an immediate payo ofS0 −K.

Part (vi) conrms the following argument: For small a strike price (in com-parison to S0) the option will almost certainly nish in the money. Therefore,a portfolio long in the option will have about the same payo as one long inthe stock. By the law of one price, the portfolios should have the same initialcost.

Part (viii) asserts that if σ is small and S0 > e−rTK then the option priceis roughly the cost of a bond with face value S0e

rT −K. As the stock has littlevolatility, this is also the expected payo of the option. Therefore, the optionand the bond should have the same price.


11.5 Exercises

In the following exercises, all derivatives are assumed to have un-derlying S and maturity T . The price process of S is given by(11.3).

1. Suppose S sells for S0 = $50.00. If r = .10 and σ = .2, use the Black-Scholes formula and the put-call parity relation to nd the prices of calland put options that expire in 90 days if K = (a) $47.00; (b) $53.00.(Use a spreadsheet with a built in normal cdf.)

2. Show that the function v(t, s) = αs+βert satises the BSM PDE (11.9),where α and β are constants. What portfolio does the function represent?

3. Show that the BSM put function P is decreasing is s. Calculate

lims→∞

P and lims→0+

P.

4. Show that

Pt(s) = Ke−r(T−t)Φ(− d2(T − t, s)

)− sΦ

(− d1(T − t, s)

).

5. A cash-or-nothing call option pays a constant amount A if ST > K andpays nothing otherwise. Use Theorem 11.3.2 to show that the value ofthe option at time t is

Vt = Ae−r(T−t)Φ(d1(T − t, St,K, σ, r)

).

6. An asset-or-nothing call option pays the amount ST if ST > K and zerootherwise. Use Theorem 11.3.2 to show that the value of the option attime t is

Vt = StΦ(d1(T − t, St,K, σ, r)

).

Use this result together with that of Exercise 5 to show that in the BSMmodel a portfolio long in an asset-or-nothing call and short in a cash-or-nothing call with cash K has the same time-t value as a standard calloption.

7. Use Exercise 6 to nd the cost V0 of a claim with payo VT =ST I(K1,K2)(ST ), where 0 < K1 < K2.

8. A collar option has payo

VT = min(

max(ST ,K1),K2

), where 0 < K1 < K2.

Show that the value of the option at time t is

Vt = K1e−r(T−t) + C(T − t, St,K1)− C(T − t, St,K2).


9. A break forward is a derivative with payo

VT = max(ST , F )−K = (ST − F )+ + F −K,where F = S0e

rT is the forward value of the stock and K is initially setat a value that makes the cost of the contract zero. Determine the valueVt of the derivative at time t and nd K.

10. Find dSk in terms of dW and dt.

11. Find the probability that a call option with underlying S nishes in themoney.

12. Let p and q be constants with p > 0, and let x1 and x2 be extended realnumbers. Verify that∫ x2

x1

e−px2+qx dx = eq

2/4p

√π

p

[Φ

(q − 2px1√

2p

)− Φ

(q − 2px2√

2p

)],

where Φ(∞) := 1 and Φ(−∞) := 0.

13. The elasticity of the call price C0 = C(T, s,K, σ, r) with respect to thestock price s is dened as

EC =s

C0

∂C0

∂s,

which is the percent increase in C0 due to a 1% increase in s. Show that

EC =sΦ(d1)

sΦ(d1)−Ke−rTΦ(d2), d1,2 := d1,2(T, s,K, σ, r).

Conclude that EC > 1, implying that the option is more sensitive tochange than the underlying stock. Show also that (a) lims→+∞EC = 1and (b) lims→0+ EC = +∞. Interpret nancially.

14. The elasticity of the put price P0 = P (T, s,K, σ, r) with respect to thestock price s is dened as

EP = − s

P0

∂P0

∂s,

which is the percent decrease in P0 due to a 1% increase in s. Show that

EP =sΦ(−d1)

Ke−rTΦ(−d2)− sΦ(−d1)

and that (a) lims→+∞EP = +∞ and (b) lims→0+ EP = 0. Interpretnancially and compare with Exercise 13.

15. Referring to Theorem 11.3.1 show that

G(t, s) =1

σ√

2π(T − t)

∫ ∞0

f(z)e−d22/2

dz

z,

where d2 = d22(T − t, s, z, σ, r).

Chapter 12

Continuous-Time Martingales

In Chapter 9, the main results of option pricing in the binomial model wereinterpreted in the context of discrete-time martingales. In Chapter 13, we carryout a similar program for the Black-Scholes-Merton model, using continuous-time martingales to nd the fair price of a derivative. The current chapterprovides the necessary tools to implement this program. The main result isGirsanov's Theorem, which guarantees the existence of risk-neutral probabilitymeasures, a fundamental construct in the theory of option valuation.

Throughout the chapter, (Ω,F ,P) denotes a xed probability space withexpectation operator E and W is a Brownian motion on (Ω,F ,P).

12.1 Conditional Expectation

Let X be an F-random variable and G a σ-eld contained in F . In Theo-rem 8.1.4 we showed that if Ω is nite and P(ω) > 0 for all ω then there exists aunique positive G-random variable E(X|G), called the conditional expectationof X given G, such that

E [IAE(X|G)] = E(IAX) for all A ∈ G. (12.1)

Conditional expectation, as dened by (12.1), may be shown to exist in thecurrent general setting as well (provided that EX exists); however, E(X|G)may no longer be unique. Indeed, any G-random variable Y that diers fromE(X|G) on a set of probability zero also satises (12.1). For this reason, prop-erties involving conditional expectation hold only almost surely, that is, on aset of probability one. For ease of exposition, we will usually omit the qualierthat a given property holds only almost surely.

The following theorem summarizes the properties of conditional expecta-tion that we shall need in the sequel. Most of the proofs are the same as in thenite case (Section 8.3), since they rely essentially on the dening property(12.1) and the (almost sure) uniqueness of conditional expectation.

Theorem 12.1.1 (Properties of Conditional Expectation). Let X and Y beF-random variables with nite expectation. Then

151


(i) (unit property) E(1|G) = 1;

(ii) (linearity) E(αX + βY |G) = αE(X|G) + βE(Y |G), α, β ∈ R;

(iii) (order property) X ≤ Y ⇒ E(X|G) ≤ E(Y |G);

(iv) (absolute value property) |E(X|G)| ≤ E(|X||G);

(v) (factor property) if X is G-measurable then E(XY |G) = XE(Y |G);

(vi) (independence property) if X and G are independent, that is, if X andIA are independent for all A ∈ G, then E(X|G) = E(X);

(vii) (iterated conditioning property) if H, G are σ-elds of events such thatH ⊆ G ⊆ F then E [E(X|G)|H] = E(X|H).

12.2 Martingales: Denition and Examples

Denition 12.2.1. A stochastic process (Mt)t≥0 on (Ω,F ,P) adapted to altration (Ft)t≥0 is said to be a

(P, (Ft)

)-martingale (or, simply, a martingale)

ifE(Mt|Fs) = Ms, 0 ≤ s ≤ t. (12.2)

A (FWt )-martingale with continuous paths is called a Brownian martingale.

Note that, by the factor property, (12.2) is equivalent to

E(Mt −Ms|Fs) = 0, 0 ≤ s ≤ t.

The following processes are examples of Brownian martingales.

Example 12.2.2.(Wt

)t≥0

: The independent increment property of Brow-

nian motion implies that Wt −Ws is independent of FWs for all s ≤ t. ByTheorem 12.1.1(vi), E(Wt −Ws|FWs ) = E(Wt −Ws) = 0.

Example 12.2.3.(W 2t − t

)t≥0

: For 0 ≤ s ≤ t,

W 2t = [(Wt −Ws) +Ws]

2 = (Wt −Ws)2 + 2Ws(Wt −Ws) +W 2

s .

Taking conditional expectations and using linearity and the factor and inde-pendence properties yields

E(W 2t |FWs ) = E(Wt −Ws)

2 + 2WsE(Wt −Ws) +W 2s = t− s+W 2

s .

Continuous-Time Martingales 153

Example 12.2.4.(eWt−t/2

)t≥0

: For 0 ≤ s ≤ t, the factor and independence

properties imply that

E(eWt |FWs ) = eWsE(eWt−Ws |FWs ) = eWsE(eWt−Ws) = eWs+(t−s)/2,

the last equality by Exercise 6.14. Therefore,

E(eWt−t/2|FWs

)= eWs−s/2.

The above examples are special cases of the following theorem.

Theorem 12.2.5. Every Ito process of the form

Xt = X0 +

∫ t

0

F (s) dW (s)

is a Brownian martingale.

Proof. For 0 ≤ s < t,

Xt −Xs = I(F ) = lim||P||→0

IP(F ),

where P = s = t0 < t1 < · · · < tn = t is a partition of [s, t],

I(F ) =

∫ t

s

F (u) dW (u), IP(F ) =

n−1∑j=0

F (tj)∆jW,

and convergence is in the mean square sense:

lim||P||→0

E|IP(F )− I(F )|2 = 0.

Now let A ∈ FWs . Since∣∣E (IAIP(F ))− E (IAI(F ))∣∣ ≤ E

∣∣IA(IP(F )− I(F ))∣∣ ≤ E

∣∣IP(F )− I(F )∣∣

andE2|IP(F )− I(F )| ≤ E|IP(F )− I(F )|2

(Exercise 6.17), we see that

E [IA(Xt −Xs)] = E [IAI(F )] = lim||P||→0

E [IAIP(F )] . (12.3)

Furthermore, since s ≤ tj for all j, linearity, independence, and iterated con-


ditioning imply that

E[IP(F )|FWs

]=

n−1∑j=0

E[F (tj)∆jW |FWs

]=

n−1∑j=0

E[E(F (tj)∆jW |FWtj

)|FWs

]

=

n−1∑j=0

E[F (tj)E

(∆jW |FWtj

)|FWs

]

=

n−1∑j=0

E[F (tj)E(∆jW )|FWs

]= 0.

Therefore,E [IAIP(F )] = E

[IAE(IP(F )|FWs )

]= 0,

which, by (12.3), implies that E [IA(Xt −Xs)] = 0 for all A ∈ FWs . Therefore,E(Xt −Xs|FWs ) = 0. For the proof that Xt has continuous paths, the readeris referred to [9].

Theorem 12.2.5 asserts that Ito processes X with dierential dX = F dWare Brownian martingales. This is not true for general Ito processes given bydX = F dW +Gdt, as the reader may readily verify.

Example 12.2.6. Let Yt be the Ito process

Yt = Y0 +

∫ t

0

F (s) dW (s)− 1

2

∫ t

0

F 2(s) ds.

By (10.14), the process Xt = eYt satises the SDE dX = FX dW . Since thereis no dt term, Theorem 12.2.5 implies that (Xt) is a Brownian martingale. Inparticular, taking F to be a constant α, we see that the process

exp(αWt − 1

2α2t), t ≥ 0,

is a Brownian martingale. Example 12.2.4 is the special case α = 1.

12.3 Martingale Representation Theorem

A martingale of the form

Xt = X0 +

∫ t

0

F (s) dW (s),


where X0 is constant, is square integrable, that is, EX2t < ∞ for all t. This

may be seen by applying Theorem 10.6.3 to the integral terms in

X2t = X2

0 + 2X0

∫ t

0

F (s) dW (s) +

(∫ t

0

F (s) dW (s)

)2

.

It is a remarkable fact that all square integrable Brownian martingales areIto processes of the above form. We state this result formally in the followingtheorem. For a proof, see, for example, [18].

Theorem 12.3.1 (Martingale Representation Theorem). If (Mt)t≥0 is asquare-integrable Brownian martingale, then there exists a square-integrableprocess (ψt) adapted to (FWt ) such that

Mt = M0 +

∫ t

0

ψ(s) dW (s), t ≥ 0.

Example 12.3.2. Let H be an FWT -random variable with EH2 <∞. Dene

Mt = E(H|FWt ), 0 ≤ t ≤ T.

The iterated conditioning property shows that (Mt) is a martingale. To seethat (Mt) is square-integrable, note rst that

H2 ≥M2t + 2Mt(H −Mt),

as may be seen by expanding (H −Mt)2. For each positive integer n dene

An = Mt ≤ n and note that An ∈ FWt . Since IAnM2t is bounded it has

nite expectation, hence we may condition on the inequality

IAnH2 ≥ IAnM2t + 2IAnMt(H −Mt)

to obtain

IAnE(H2|FWt ) ≥ IAnM2t + 2IAnMtE(H −Mt|FWt ) = IAnM2

t .

Letting n → ∞ and noting that for each ω the sequence IAn(ω) eventuallyequals 1, we see that

E(H2|FWt ) ≥M2t .

Taking expectations yields E(H2) ≥ E(M2t ) hence (Mt) is square-integrable.

It may be shown that each Mt may be modied on a set of probabilityzero so that the resulting process has continuous paths (see [18]). Thus, (Mt)has the representation described in Theorem 12.3.1.


12.4 Moment Generating Functions

The proof of Girsanov's Theorem given in the next section is based on thefollowing important notion from probability theory.

Denition 12.4.1. The moment generating function (mgf) φXof a random

variable X is dened byφX

(λ) = E eλX

for all real numbers λ for which the expectation is nite.

To see how φX

gets its name, expand eλX in a power series and takeexpectations to obtain

φX

(λ) =

∞∑n=0

λn

n!EXn.

Dierentiating we have φ(n)X

(0) = EXn, the nth moment of X.

Example 12.4.2. Let X ∼ N(0, 1). Then

φX

(λ) =1√2π

∫ ∞−∞

eλx−x2/2 dx

=eλ

2/2

√2π

∫ ∞−∞

e−(x−λ)2/2 dx

= eλ2/2.

To nd the moments of X write

eλ2/2 =

∞∑n=0

λ2n

n!2n

and compare power series to obtain EX2n+1 = 0 and

EX2n =(2n)!

n!2n= (2n− 1)(2n− 3) · · · 3 · 1.

(See Example 6.3.2, where these moments were found by direct integration.)More generally, if X ∼ N(µ, σ2) then Y := (X − µ)/σ ∼ N(0, 1) hence

φX

(λ) = E eλ(σY+µ) = eµλφY

(σλ) = eµλ+σ2λ2/2.

Moment generating functions derive their importance from the followingtheorem, which asserts that the distribution of a random variable is completelydetermined by its mgf. A proof may be found in standard texts on probability.

Theorem 12.4.3. If random variables X and Y have the same mgf, thenthey have the same cdf.


Example 12.4.4. Let Xj ∼ N(µj , σ2j ), j = 1, 2. If X1 and X2 are indepen-

dent, then, by Example 12.4.2,

φX2+X2

(λ) = E(eλX1eλX2

)= φ

X1(λ)φ

X2(λ) = eµ1λ+(σ1λ)2/2eµ2λ+(σ2λ)2/2

= eµλ+(σλ)2/2,

where µ = µ1 +µ2 and σ2 = σ2

1 +σ22 . By Theorem 12.4.3, X1 +X2 ∼ N(µ, σ2),

a result obtained in Example 3.6.2 with considerably more eort.

Denition 12.4.5. The moment generating function φXof a random vector

X = (X1, X2, . . . , Xn) is dened (whenever the expectation exists) by

φX

(λ) = E(eλ·X

), λ = (λ1, λ2, . . . , λn),

where λ ·X :=∑nj=1 λjXj.

The following result generalizes Theorem 12.4.3 to random vectors.

Theorem 12.4.6. If X = (X1, X2, . . . , Xn) and Y = (Y1, Y2, . . . , Yn) havethe same mgf, then they have the same joint cdf.

Corollary 12.4.7. Let X = (X1, X2, . . . , Xn). Then X1, X2, . . . , Xn are in-dependent i for all λ

φX

(λ) = φX1

(λ1)φX2

(λ2) · · ·φXn

(λn). (12.4)

Proof. The necessity is clear. To prove the suciency, let Y1, Y2, . . . , Yn beindependent random variables with F

Yj= F

Xjfor all j and set Y =

(Y1, Y2, . . . , Yn).1 Then φYj

= φXj

so, by independence and (12.4),

φY

(λ) = φY1

(λ1)φY2

(λ2) · · ·φYn

(λn) = φX

(λ).

By Theorem 12.4.6, FX

= FY. Therefore,

FX

(x1, x2, . . . , xn) = FY

(x1, x2, . . . , xn)

= FY1

(x1)FY2

(x2) · · ·FYn

(xn)

= FX1

(x1)FX2

(x2) · · ·FXn

(xn),

which shows that X1, X2, . . . , Xn are independent.

1The random variable Yj is generally taken to be the jth coordinate function on a newprobability space Rn, where the probability measure is dened so that the sets (−∞, x1]×(−∞, x2]× · · · × (−∞, xn] have probability FX1

(x1)FX2(x2) · · ·FXn (xn).


12.5 Change of Probability and Girsanov's Theorem

In Remark 9.2.5 we observed that if Ω is nite then, given a nonnegativerandom variable Z with E(Z) = 1, the equation

P∗(A) = E(IAZ), A ∈ F , (12.5)

denes a probability measure P∗ on (Ω,F) such that P∗(ω) > 0 i P(ω) > 0.Conversely, any probability measure P∗ with this positivity property satises(12.5). The positivity property is a special case of the notion of equivalentprobability measures, that is, measures having precisely the same sets of prob-ability zero. These ideas carry over to the general setting as follows:

Theorem 12.5.1 (Change of Probability). Let Z be a positive random vari-able on (Ω,F) with EZ = 1. Then (12.5) denes a probability measure P∗ on(Ω,F) equivalent to P. Moreover, all probability measures P∗ equivalent to Parise in this manner and satisfy

E∗(X) = E(XZ) (12.6)

for all F-random variables X for which E(XZ) is dened, where E∗ is theexpectation operator corresponding to P∗.

The proof of Theorem 12.5.1 may be found in advanced texts on prob-ability theory. As in the nite case, the random variable Z is called theRadon-Nikodym derivative of P∗ with respect to P and is denoted by dP∗

dP .The connection between P and P∗ described in (12.5) and (12.6) is frequentlyexpressed as

dP∗ = ZdP.Replacing X in (12.6) by XZ−1, we obtain the companion formulas

E(X) = E∗(XZ−1) and P(A) = E∗(IAZ−1),

that is, dP = Z−1dP∗.For an illuminating example, consider the translation Y := X + α of a

standard normal random variable X by a real number α. The random variableY is normal so one might ask if there is a probability measure P∗ equivalentto P under which Y is standard normal. To answer this question, supposethat such a probability measure exists and set Z = dP∗

dP . Since Z shouldsomehow depend on X, we assume that Z = g(X) for some function g(x) tobe determined. By (12.5),

P∗(Y ≤ y) = E[IY≤yZ

]= E

[I(−∞,y](X + α)g(X)

],

and, since X ∼ N(0, 1) under P, the law of the unconscious statistician impliesthat

P∗(Y ≤ y) =

∫ ∞−∞

I(∞,y](x+ α)g(x)ϕ(x) dx =

∫ y−α

−∞g(x)ϕ(x) dx.


If Y is to be standard normal with respect to P∗, we must therefore have∫ y−α

−∞g(x)ϕ(x) dx = Φ(y).

Dierentiating yieldsg(y − α)ϕ(y − α) = ϕ(y)

so that

g(y) =ϕ(y + α)

ϕ(y)= e−αy−α

2/2.

Thus, we are led to the probability measure P∗ dened by

dP∗ = g(X) dP = exp(−αX − 1

2α2)dP.

One easily checks that X + α is indeed standard normal under P∗.Girsanov's Theorem generalizes this result from a single random variable

to an entire process.

Theorem 12.5.2 (Girsanov's Theorem). Let (Wt)0≤t≤T be a Brownian mo-tion on the probability space (Ω,F ,P) and let α be a constant. Dene

Zt = exp(−αWt − 1

2α2t), 0 ≤ t ≤ T.

Then the process W ∗ dened by

W ∗t := Wt + αt, 0 ≤ t ≤ T,is a Brownian motion on the probability space (Ω,F ,P∗), where dP∗ = ZT dP.Proof. Note rst that (Zt)0≤t≤T is a P-martingale (Example 12.2.6). In par-ticular, EZT = EZ0 = 1 hence P∗ is well-dened. It is clear that W ∗ startsat 0 and has continuous paths, so it remains to show that, under P∗, W ∗ hasindependent increments and W ∗t −W ∗s is normally distributed with mean 0and variance t− s, 0 ≤ s < t.

Let 0 ≤ t0 < t1 < · · · < tn ≤ T and dene random vectors

X = (X1, X2, . . . , Xn) and X∗ = (X∗1 , X∗2 , . . . , X

∗n),

where

Xj := Wtj −Wtj−1and X∗j := W ∗tj −W ∗tj−1

, j = 1, 2, . . . , n.

The core of the proof is determining the mgf of X∗ with respect to P∗.Let λ = (λ1, λ2, . . . , λn). By the factor and independence properties,

E∗ eλ·X = E(eλ·XZT

)= E

[eλ·XE (ZT |Ftn)

]= E

(eλ·XZtn

)= E exp

(λ ·X − αWtn − 1

2α2tn)

= E exp

n∑j=1

(λj − α)Xj − αWt0 − 12α

2tn

,= E exp

(−αWt0 − 1

2α2tn) n∏j=1

E exp((λj − α)Xj

).


The factors in the last expression are mgfs of normal random variables hence,by Example 12.4.2,

E∗ eλ·X = exp

1

2

α2(t0 − tn) +

n∑j=1

(λj − α)2(tj − tj−1)

.Since

λ ·X∗ = λ ·X + α

n∑j=1

λj(tj − tj−1),

we see that E∗ eλ·X∗ = e12A, where

A = α2(t0 − tn) +

n∑j=1

(λj − α)2(tj − tj−1) + 2α

n∑j=1

λj(tj − tj−1)

=

n∑j=1

λ2j (tj − tj−1).

Thus,

E∗ eλ·X∗

= exp

1

2

n∑j=1

λ2j (tj − tj−1)

.In particular, E∗ eλjX

∗j = e(tj−tj−1)λ2

j/2 soW ∗tj −W ∗tj−1is normally distributed

with mean zero and variance tj − tj−1 (Example 12.4.2 and Theorem 12.4.3).Since

E∗ eλ·X∗

=

n∏j=1

E∗ eλjX∗j ,

the increments W ∗tj − W ∗tj−1are independent (Corollary 12.4.7). Therefore,

W ∗ is a P∗-Brownian motion.

Remark 12.5.3. The general version of Girsanov's Theorem allows morethan one Brownian motion. It asserts that for independent Brownian motionsWj on (Ω,F ,P) and constants αj , j = 1, 2, . . . , d, there exists a single prob-ability measure P∗ relative to which the processes W ∗j (t) := Wj(t) + αjt,0 ≤ t ≤ T , are Brownian motions with respect to ltration generated by theprocessesWj . The αj 's may even be stochastic processes provided they satisfythe Novikov condition

E

[exp

(1

2

∫ T

0

β(s) ds

)]<∞, β :=

d∑j=1

α2j .

In this case, W ∗j (t) is dened as Wj(t) +∫ t

0αj(s) ds. (See, for example, [17].)


12.6 Exercises

1. Find the mgfs of (a) a binomial random variable X with parameters(n, p); (b) a geometric random variable X with parameter p.

2. Find the mgf of a random variable X uniformly distributed on the in-terval [0, 1].

3. Show that, for 0 ≤ s ≤ t, (a) E(WsWt) = s and (b) E(Wt|Ws) = Ws.

4. Let X and Y be jointly distributed continuous random variables withfX

(x) > 0 for all x. Show that E(Y |X) = g(X), where

g(x) =

∫ ∞−∞

fX,Y

(x, y)

fX

(x)y dy.

The functionfX,Y

(x, y)

fX

(x)is called the conditional density of Y given X.

What is g(x) if X and Y are independent?

5. Show that, for 0 < s < t, the joint density ft,s

of (Wt,Ws) is given by

ft,s

(x, y) :=1√

s(t− s)ϕ

(x− y√t− s

)ϕ

(y√s

).

6. Use Exercises 4 and 5 to nd E(Ws|Wt) for 0 < s < t.

7. Show thatM :=(eαWt+h(t)

)t≥0

is a martingale i h(t) = −α2t/2+h(0).

8. Show that

E (W 3t |FWs ) = W 3

s + 3(t− s)Ws, 0 ≤ s ≤ t.

Conclude that(W 3t − 3tWt

)is a martingale.

Hint: Expand [(Wt −Ws) +Ws]3.

9. Find E(W 2t |Ws) and E(W 3

t |Ws) for 0 < s < t.

10. The Hermite polynomials Hn(x, t) are dened by Hn(x, t) = f(n)x,t (0),

wherefx,t(λ) := f(λ, x, t) = exp (λx− 1

2λ2t).

(a) Show that exp(λx− 1

2λ2t)

=

∞∑n=0

Hn(x, t)λn

n!.


(b) Use (a) and the fact that(exp (λWt − 1

2λ2t))t≥0

is a Brownian mar-

tingale to show that the process (Hn(Wt, t))t≥0 is a Brownian mar-tingale for each n. (Example 12.2.3 and Exercise 8 are the specialcases H2(Wt, t) and H3(Wt, t), respectively.)

(c) Show that f(n+1)x,t (λ) = (x−λt)f (n)

x,t (λ)−ntf (n−1)x,t (λ) and hence that

Hn+1(x, t) = xHn(x, t)− ntHn−1(x, t).

(d) Use (c) to nd explicit representations of the martingales H4(Wt, t)and H5(Wt, t).

(e) Use the Ito-Doeblin formula to show that

Hn(Wt, t) =

∫ t

0

nHn−1(Ws, s) dWs.

This gives another proof that (Hn(Wt, t))t≥0 is a martingale.

11. Let (Wt)0≤t≤T be a Brownian motion on the probability space (Ω,F ,P)and let α be a constant. Suppose that X is a random variable inde-pendent of WT . Show that X has the same cdf under P as under P∗,where

dP∗ = exp(−αWT − 1

2α2T)dP.

12. Let W1 and W2 be independent Brownian motions and 0 < |%| < 1.

Show that the process W = %W1 +√

1− %2W2 is a Brownian motion.

Chapter 13

The BSM Model Revisited

The continuous-time martingale theory developed in Chapter 12 is used inthe present chapter as an alternative method of determining the fair price ofa derivative in the Black-Scholes-Merton model. The last section of the chap-ter provides the connection, in the form of the Feynman-Kac RepresentationTheorem, between the martingale approach to option pricing and the PDEapproach of Chapter 11.

Throughout the chapter, (Ω,F ,P) denotes a xed probability space withexpectation operator E, andW is a Brownian motion on (Ω,F ,P). As in Chap-ter 11, our market consists of a risk-free bond B with price process governedby the ODE

dB = rB dt, B0 = 1.

and a stock S with price process S following the SDE

dS = σS dW + µS dt,

where µ and σ are constants. As shown in Chapter 10, the solution to theSDE is

St = S0 exp(σWt + (µ− 1

2σ2)t), 0 ≤ t ≤ T.

All martingales in this chapter are relative to the ltration (FSt )Tt=0. Notethat, because St and Wt may each be expressed in terms of the other bya continuous function, FSt = FWt for all t. As in Chapter 11, we assumethroughout that the market is arbitrage-free.

13.1 Risk-Neutral Valuation of a Derivative

Denition 13.1.1. The probability measure P∗ on (Ω,F) dened by

dP∗ = ZT dP, ZT := e−αWT− 12α

2T , α :=µ− rσ

,

is called the risk-neutral probability measure for the price process S. Thecorresponding expectation operator is denoted by E∗.

163


The following theorem and its corollary are the main results of the chapter.Martingale proofs are given in the next section. Note that the conclusion ofthe corollary is in agreement with Theorem 11.3.2, obtained by PDE methods.

Theorem 13.1.2. Let H be a claim, that is, an FT -random variable, withEH2 < ∞. Then there exists a unique self-nancing, replicating strategy forH with value process V such that

Vt = e−r(T−t)E∗ (H|Ft) , 0 ≤ t ≤ T. (13.1)

Corollary 13.1.3. If H is a European claim of the form H = f(ST ), wheref is continuous and EH2 <∞, then

Vt = e−r(T−t)E∗ (f(ST )|Ft) = e−r(T−t)G(t, St), 0 ≤ t ≤ T, (13.2)

where

G(t, s) :=

∫ ∞−∞

f(s exp

σ√T − t y + (r − 1

2σ2)(T − t)

)ϕ(y) dy. (13.3)

As noted in Chapter 11, the no-arbitrage assumption implies that Vt mustbe the time-t value of the claim.

Example 13.1.4. By Corollary 13.1.3, the time-t value of a forward contractwith forward price K is

Ft = e−r(T−t)E∗ (ST −K|Ft) . (13.4)

Because there is no cost in entering a forward contract, F0 = 0, and therefore

K = E∗ ST = erTE∗ ST = erTS0. (13.5)

Here we have used the fact that S is a P∗-martingale (Lemma 13.2.3, be-low). (Recall that Equation 13.5 was obtained in Section 4.3 using a generalarbitrage argument.) By the martingale property again,

E∗(ST |Ft) = erTE∗(ST |Ft) = erT St = er(T−t)St. (13.6)

Substituting (13.5) and (13.6) into (13.4), we see that

Ft = e−r(T−t)(er(T−t)St − erTS0

)= St − ertS0,

in agreement with Equation (4.4) of Section 4.3.

Example 13.1.5. The time-t value of a call option with strike price K andmaturity T is

Ct = e−r(T−t)E∗[(ST −K)+|Ft

]= e−r(T−t)G(t, St), 0 ≤ t ≤ T,

where G(t, s) is given by (13.3) with f(x) = (x−K)+. Evaluating (13.3) forthis f produces the BSM formula, exactly as in Corollary 11.3.3.

The BSM Model Revisited 165

13.2 Proofs of the Valuation Formulas

Lemmas 13.2.113.2.4 in this section are used in the proof of Theo-rem 13.1.2.

Lemma 13.2.1. Under the risk-neutral probability P∗, the process

W ∗t := Wt + αt, 0 ≤ t ≤ T, α :=µ− rσ

,

is a Brownian motion with Brownian ltration (FSt ).

Proof. Since Wt and W∗t dier by a constant, FW∗t = FWt = FSt . The con-

clusion now follows from Girsanov's Theorem.

We omit the straightforward verication of the next lemma.

Lemma 13.2.2. In terms of W ∗, the price process S is given by

St = S0 exp(σW ∗t +

(r − 1

2σ2)t), 0 ≤ t ≤ T.

From Lemma 13.2.2 and Example 12.2.6, we have

Lemma 13.2.3. The discounted price process S, given by

St := e−rtSt = S0 exp(σW ∗t − 1

2σ2t), 0 ≤ t ≤ T,

is a P∗-martingale with S = σdW ∗.

Lemma 13.2.4. Let (φ, θ) be a self-nancing portfolio adapted to (FSt ) withvalue process

Vt = φtBt + θtSt, 0 ≤ t ≤ T.Then the discounted value process V , given by Vt := e−rtVt, is a P∗-martingale.

Proof. By Ito's product rule and the self-nancing condition dV = φdB+θ dS,

dVt = −re−rtVt dt+ e−rtdVt

= −re−rt [φtBt + θtSt] dt+ e−rt [rφtBt dt+ θt dSt]

= −re−rtθtSt dt+ e−rtθt dSt

= θt dSt

= σθtSt dW∗t ,

the last equality from the Ito-Doeblin formula and Lemma 13.2.3. It followsfrom Theorem 12.2.5 that V is a P∗-martingale.

The proof of Corollary 13.1.3 uses the following lemma.


Lemma 13.2.5. Let G be a σ-eld contained in F , let X be a G-random vari-able and Y an F-random variable independent of G. If g(x, y) is a continuousfunction with E∗ |g(X,Y )| <∞, then

E∗ [g(X,Y )|G] = G(X), (13.7)

where G(x) := E∗ g(x, Y ).

Proof. We give an outline of the proof. Let R denote the smallest σ-eldof subsets of R2 containing all rectangles R = J × K, where J and K areintervals. If g = IR, then

E∗ [g(X,Y )|G] = E∗ [IJ(X)IK(Y )|G] = IJ(X)E∗ [IK(Y )|G] = IJ(X)E∗ IK(Y ),

where we have used the G-measurability of IJ(X) and the independence ofIK(Y ) and G. Since

G(x) = E∗ [IJ(x)IK(Y )] = IJ(x)E∗ IK(Y ),

we see that (13.7) holds for indicator functions of rectangles. From this itmay be shown that (13.7) holds for indicator functions of all members ofR and hence for linear combinations of these indicator functions. Because acontinuous function is the limit of a sequence of such linear combinations,(13.7) holds for any function g satisfying the conditions of the lemma.

Remark 13.2.6. For future reference, we note that Lemma 13.2.5 extends tomore than one G-measurable random variable X. For example, if X1 and X2

are G-measurable and if g(x1, x2, y) is continuous with E∗ |g(X1, X2, Y )| <∞then

E∗ [g(X1, X2, Y )|G] = G(X1, X1),

where G(x1, x2) = E∗ g(x1, x2, Y ). The proof is similar to that ofLemma 13.2.5.

Proof of Theorem 13.1.2

Dene a process V by Equation (13.1). Then VT = e−rTH and

Vt = e−rtVt = E∗(VT |Ft

), 0 ≤ t ≤ T.

By Example 12.3.2, V is a square-integrable P∗-martingale so that, by theMartingale Representation Theorem, (Theorem 12.3.1),

Vt = V0 +

∫ t

0

ψ(s) dW ∗(s), 0 ≤ t ≤ T,

for some process ψ adapted to (FSt ). Now set

θ =ψ

σSand φ = B−1(V − θS).


Then (φ, θ) is adapted to (Ft) and V = θS + φB. Furthermore,

dVt = ert dVt + rVt dt

= ertψt dW∗t + rVt dt

=ertψt

σStdSt + rVt dt (by Lemma 13.2.3)

= ertθt[−re−rtSt dt+ e−rtdSt

]+ rVt dt

= θt dSt + r [Vt − θtSt] dt= θt dSt + φt dBt.

Therefore, (φ, θ) a self-nancing, replicating trading strategy for H with valueprocess V .

To show uniqueness, suppose that (φ′, θ′) is a self-nancing, replicatingtrading strategy for H based on S and B. By Lemma 13.2.4, the value processV ′ of the strategy is a martingale hence

V ′t = E∗(V ′T |FSt ) = E∗(e−rTH|FSt ) = Vt.

Therefore, V ′ = V .

Proof of Corollary 13.1.3

By Lemma 13.2.2, we may write

ST = St exp(σ√T − t Yt + (r − 1

2σ2)(T − t)

), (13.8)

where

Yt :=W ∗T −W ∗t√

T − t .

Now dene

g(t, x, y) = f(x exp

σ√T − t y + (r − 1

2σ2)(T − t)

).

Since Yt ∼ N(0, 1) under P∗, the law of the unconscious statistician im-plies that E∗ g(t, x, Yt) = G(t, x). Moreover, from (13.8), g (t, St, Yt) = f(ST ).Therefore, by Lemma 13.2.5,

E∗[f(ST )|FSt ] = E∗[g(t, St, Yt)|FSt ] = G(t, St).

13.3 Valuation under PThe following theorem expresses the value process (Vt) in terms of the orig-

inal probability measure P. It is the continuous-time analog of Theorem 9.2.1.


Theorem 13.3.1. The time-t value of a claim H with EH2 <∞ is given by

Vt = e−r(T−t)E(HZT |FSt )

E(ZT |FSt )= e−(r+α2/2)(T−t)E(e−α(WT−Wt)H|FSt ), (13.9)

where

ZT := e−αWT− 12α

2T and α :=µ− rσ

.

Proof. Since E |HZT | = E∗|H| is nite, the conditional expectationE(HZT |FSt ) is dened. Since Vt = e−r(T−t)E∗

(H|FSt

), the rst equality in

(13.9) is equivalent to

E(HZT |FSt ) = E∗(H|FSt )E(ZT |FSt ). (13.10)

To verify (13.10), let A ∈ FSt and set Xt = E∗(IAH|FSt ). Then,

E(IAHZT ) = E∗(IAH) = E∗Xt = E [XtZT ] = E[E(XtZT |FSt )

]= E

[IAE∗(H|FSt )E(ZT |FSt )

],

establishing (13.10) and hence the rst equality in (13.9).For the second equality we have

E(HZT |FSt )

E(ZT |FSt )=

E(e−αWTH|FSt )

E(e−αWT |FSt ),

and by the factor and independence properties,

E(e−αWT |FSt ) = E(e−α(WT−Wt)e−αWt |FSt )

= e−αWtE(e−α(WT−Wt)

)= e−αWteα

2(T−t)/2,

the last equality by Exercise 6.14. Therefore

E(HZT |FSt )

E(ZT |FSt )= e−α

2(T−t)/2E(e−α(WT−Wt)H|FSt

).

13.4 The Feynman-Kac Representation Theorem

We now have two ways of deriving the Black-Scholes pricing formula, oneusing PDE techniques and the other using martingale methods. The connec-tion between the two methods is given by the Feynman-Kac RepresentationTheorem, which gives a probabilistic solution to a class of PDEs. The followingversion of the theorem is sucient for our purposes.


Theorem 13.4.1 (Feynman-Kac Representation Theorem). Let µ(t, x),σ(t, x) and f(x) be continuous functions. Suppose that, for 0 ≤ t ≤ T , Xt

is the solution of the SDE

dXt = µ(t,Xt) dt+ σ(t,Xt) dWt (13.11)

and w(t, x) is the solution of the boundary value problem

wt(t, x) + µ(t, x)wx(t, x) + 12σ

2(t, x)wxx(t, x) = 0, w(T, x) = f(x). (13.12)

If ∫ T

0

E [σ(t,Xt)wx(t,Xt)]2dt <∞, (13.13)

then w(t,Xt) = E[f(XT )|FWt

], 0 ≤ t ≤ T .

Proof. Since f(XT ) = w(T,XT ), the conclusion of the theorem will follow if

we show that (w(t,Xt))Tt=0 is a martingale. By Version 3 of the Ito-Doeblin

formula,

dw(t,X) = wt(t,X) dt+ wx(t,X) dX + 12wxx(t,X)(dX)2

= wt(t,X) dt+ wx(t,X)[µ(t,X) dt+ σ(t,X) dW

]+ 1

2σ2(t,X)wxx(t,X) dt

=[wt(t,X) + µ(t,X)wx(t,X) + 1

2σ2(t,X)wxx(t,X)

]dt

+ σ(t,X)wx(t,X) dW

= σ(t,X)wx(t,X) dW,

the last equality by (13.12). It follows from (13.13) and Theorem 10.6.3(vi)that w(t,Xt) is an Ito process. An application of Theorem 12.2.5 completesthe proof.

Corollary 13.4.2. Suppose that Xt saties (13.11) and that v(t, x) is thesolution of the boundary value problem

vt(t, x) + µ(t, x)vx(t, x) + 12σ

2(t, x)vxx(t, x)− rv(t, x) = 0, v(T, x) = f(x),

where r is a constant and∫ T

0

E [σ(t,Xt)vx(t,Xt)]2dt <∞.

Then v(t,Xt) = e−r(T−t)E[f(XT )|FWt

], 0 ≤ t ≤ T .

Proof. One easily checks that w(t, x) := er(T−t)v(t, x) satises (13.12) and(13.13)


Remark. In the derivation of the Black-Scholes formula in Chapter 11, -nancial considerations led to the PDE

vt + rxvx + 12σ

2x2vxx − rv = 0, 0 ≤ t < T.

The corollary therefore provides the desired connection between the PDE andmartingale methods of option valuation.


13.5 Exercises

1. Show that the risk-neutral probability of a call nishing in the money isΦ(d2(T, S0,K, σ, r)

).

2. Use Girsanov's Theorem to nd a probability measure P∗∗ under whichthe probability of a call nishing in the money is Φ

(d1(T, S0,K, σ, r)

).

What is dP∗∗dP∗ ?

Hint: For the rst part, set

W ∗∗t := Wt + βt, β := σ−1(µ− r − σ2

).

For the second part, use

dP∗∗

dP∗=dP∗∗

dPdPdP∗

.

3. Show that the process e−(r+σ2)tSt is a P∗∗-martingale.


Chapter 14

Other Options

In Chapter 13, the price of a standard European call option was obtained usingthe risk-neutral probability measure given by Girsanov's Theorem. There area variety of other options that may be similarly valued. In this chapter weconsider the most common of these.

As in previous chapters, we assume that the markets under considerationare arbitrage-free, so that the value of a claim is that of a self-nancing,replicating portfolio. Throughout, (Ω,F ,P) denotes a xed probability spacewith expectation operator E. Risk-neutral measures on Ω will be denoted,as usual, by the generic notation P∗, with E∗ the corresponding expectationoperator. As before, (FSt ) denotes the natural ltration for the price processS of the underlying asset S.

14.1 Currency Options

In this section, we consider derivatives whose underlying is a euro bond.Let Dt = erdt and Et = eret denote the price processes of a US dollar bondand a euro bond, respectively, where rd and re are the dollar and euro interestrates, and let Q denote the exchange rate process in dollars per euro (seeSection 4.4). To model the volatility of the exchange rate, we take Q to be ageometric Brownian motion give by

Qt = Q0 exp[σWt + (µ− 1

2σ2)t], 0 ≤ t ≤ T, (14.1)

where σ and µ are constants. Dene

St = QtEt = S0 exp[σWt + (µ+ re − 12σ

2)t], (14.2)

which is the dollar value of the euro bond at time t. Because of the volatilityof the exchange rate, from the point of view of the domestic investor, the eurobond is a risky asset. The form of the price process S clearly reects thatview.

Given a claim H with EH2 <∞, we can apply the methods of Chapter 13to construct a self-nancing replicating trading strategy (φ, θ) for H, where

173


φt and θt are, respectively, the number of units of the dollar bond and theeuro bond held at time t. Set

r := rd − re, α :=µ− rσ

, and W ∗t := Wt + αt. (14.3)

By Girsanov's Theorem, W ∗ is a Brownian measure under P∗, where

dP∗ := e−αWT− 12α

2T dP

Let V denote the value process of (φ, θ), and set S := D−1S and V := D−1V .Since

St = S0 exp[σW ∗t + (rd − σ2/2)t],

the process S and hence also V := D−1V are P∗-martingales. Risk-neutralpricing and the no-arbitrage assumption therefore imply that the time-t dollarvalue of H is given by

Vt = e−rd(T−t)E∗ (H|Ft) , 0 ≤ t ≤ T.

Example 14.1.1. (Currency Forward). Consider a forward contract for thepurchase in dollars of one euro at time T . Let K denote the forward price ofthe euro. At time T , the euro costs QT dollars hence the dollar value of theforward at time t is

Vt = e−rd(T−t)E∗ (QT −K|Ft) . (14.4)

Because there is no cost to enter a forward contract, V0 = 0 and thereforeK = E∗QT . Since Q = DSE−1,

E∗(QT |Ft) = DTE−1T E∗(ST |Ft) = DTE

−1T St = er(T−t)Qt (14.5)

and in particularK = E∗QT = erTQ0. (14.6)

Substituting (14.5) and (14.6) into (14.4), we obtain

Vt = e−rd(T−t)(er(T−t)Qt − erTQ0

)= e−reT

(eretQt − erdtQ0

).

More generally, suppose that H = f(QT ), where f(x) is continuous. Setf1(x) = f

(e−reTx

)so that H = f1(ST ). From (14.2), S is the price process

of the stock in Chapter 13 with µ replaced by µ + re. By Theorem 11.3.2 orCorollary 13.1.3,

Vt = e−rd(T−t)G1(t, St),

where

G1(t, s) :=

∫ ∞−∞

f1

(s exp

σ√T − t y +

(rd − 1

2σ2)

(T − t))

ϕ(y) dy.

Now dene G(t, s) = G1 (t, erets), so that G1(t, St) = G(t, Qt). Replacing s inthe denition of G1(t, s) by erets, we arrive at the following result:

Other Options 175

Theorem 14.1.2. Let H = f(QT ), where f is continuous and EH2 < ∞.Then the value of H at time t is

Vt = e−rd(T−t)G(t, Qt),

where

G(t, s) :=

∫ ∞−∞

f(

expσ√T − t y + (rd − re − 1

2σ2)(T − t)

)ϕ(y) dy.

Example 14.1.3. (Currency Call Option). Taking f(x) = (x−K)+ in The-orem 14.1.2, we see that the time-t dollar value of an option to buy one eurofor K dollars at time T is Ct = e−rd(T−t)G(t, Qt), where, as in the proof ofCorollary 11.3.3 with r replaced by rd − re,

G(t, s) = se(rd−re)(T−t)Φ (d1(s, T − t))−KΦ (d2(s, T − t)) ,

d1,2(s, τ) :=ln (s/K) + (rd − re ± σ2/2)τ

σ√τ

.

Thus,

Ct = e−re(T−t)QtΦ(d1(Qt, T − t)

)− e−rd(T−t)KΦ

(d2(Qt, T − t)

).

This may also be expressed in terms of the forward price Kt = e(rd−re)(T−t)Qtof a euro (see Section 4.4). Indeed, since Qt = e(re−rd)(T−t)Kt we have

Ct = e−rd(T−t)KtΦ(d1(Kt, T − t)


(d2(Kt, T − t)

),

where

d1,2(s, τ) =ln (s/K)± τσ2/2

σ√τ

.

14.2 Forward Start Options

A forward start option is a contract that gives the holder at time T0, forno extra cost, an option with maturity T > T0 and strike price K = ST0 .Consider, for example, a forward start call option, whose underlying call haspayo (ST − ST0

)+. Let Vt denote the value of the option at time t. At timeT0 the strike price ST0

is known; hence, for all later times, the forward startoption has the value of the call. Thus, in the notation of Section 11.4,

Vt = C(T − t, St, ST0 , σ, r), T0 ≤ t ≤ T.

To nd Vt for t ≤ T0, note that

VT0= C(T − T0, ST0

, ST0, σ, r) = C(T − T0, 1, 1, σ, r)ST0

,


which is the value at time T0 of a portfolio consisting of C(T − T0, 1, 1, σ, r)units of the underlying security. Since the values of the forward start optionand the portfolio agree at time T0, they must agree at all times t ≤ T0 hence

Vt = C(T − T0, 1, 1, σ, r)St =[Φ(d1)− er(T−T0)Φ(d2)

]St, 0 ≤ t ≤ T0,

whered1,2 = d1,2(T − T0, 1, 1, σ, r) =

( rσ± σ

2

)√T − T0.

In particular, the initial cost of the forward start call option is

V0 =[Φ(d1)− er(T−T0)Φ(d2)

]S0.

14.3 Chooser Options

A chooser option gives the holder the right to select at some future dateT0 whether the option is to be a call or a put with common exercise price K,maturity T > T0, and underlying S. Let Vt, Ct, and Pt denote, respectively,the time-t values of the chooser option, the call, and the put. In the notationof Section 11.4,

Ct = C(T − t, St,K, σ, r) and Pt = P (T − t, St,K, σ, r).

Since at time T0 the holder will choose the option with the higher value,

VT0= max(CT0

, PT0)

= max(CT0 , CT0 − ST0 +Ke−r(T−T0))

= CT0 +(Ke−r(T−T0) − ST0

)+,

where we have used the put-call parity relation in the second equality. The lastexpression is the value at time T0 of a portfolio consisting of a long call optionwith strike price K and maturity T and a long put option with strike priceK1 := Ke−r(T−T0) and maturity T0. Since the values of the chooser optionand the portfolio are the same at time T0, they must be the same for all timest ≤ T0. Thus, using put-call parity again and noting that K1e

−r(T0−t) =Ke−r(T−t), we have for 0 ≤ t ≤ T0

Vt = C(T − t, St,K, σ, r) + P (T0 − t, St,K1, σ, r)

= C(T − t, St,K, σ, r) + C(T0 − t, St,K1, σ, r)− St +Ke−r(T−t).

In particular,

V0 = C(T, S0,K, σ, r) + C(T0, S0,K1, σ, r)− S0 +Ke−rT . (14.7)

Other Options 177

To evaluate (14.7) we apply Black-Scholes:

C(T, S0,K, σ, r) = S0Φ(d1)−Ke−rTΦ(d2) and

C(T0, S0,K1, σ, r) = S0Φ(d1)−K1e−rT0Φ(d2),

where

d1,2 = d1,2(T, S0,K, σ, r) =ln (S0/K) + (r ± σ2/2)T

σ√T

and

d1,2 = d1,2(T0, S0,K1, σ, r) =ln (S0/K) + rT ± σ2T0/2

σ√T0

.

Substituting into (14.7), we obtain the formula

V0 = S0Φ(d1)−Ke−rTΦ(d2) + S0Φ(d1)−Ke−rTΦ(d2)− S0 +Ke−rT

= S0

[Φ(d1) + Φ(d1)− 1

]−Ke−rT

[Φ(d2) + Φ(d2)− 1

]= S0

[Φ(d1)− Φ(−d1)

]−Ke−rT

[Φ(d2)− Φ(−d2)

].

The value of the option for T0 ≤ t ≤ T is either Ct or Pt, depending onwhether the call or put was chosen at time T0. To distinguish between the twoscenarios let

A = CT0> PT0

.Since IA = 1 i the call was chosen and IA′ = 1 i the put was chosen we have

Vt = CtIA + PtIA′ , T0 ≤ t ≤ T.

In particular, the payo of the chooser option is

VT =(ST −K

)+IA +(K − ST

)+IA′ .

14.4 Compound Options

A compound option is a call or put option whose underlying is another callor put option, the latter with underlying S. Consider the case of a call-on-calloption. Suppose that the underlying call has strike price K and maturity T ,and that the compound option has strike price K0 and maturity T0 < T . Weseek the fair price V cc0 of the compound option at time 0.

The value of the underlying call at time T0 is C(ST0), where, by the Black-Scholes formula,

C(s) = C(T − T0, s,K, σ, r) = sΦ(d1(s)

)−Ke−r(T−T0)Φ

(d2(s)

),


d1,2(s) =ln (s/K) + (r ± σ2/2)(T − T0)

σ√T − T0

.

By Corollary 13.1.3 with f(s) =[C(s) − K0

]+, the cost of the compound

option is

V cc0 = e−rT0

∫ ∞−∞

[C (g(y))−K0

]+ϕ(y) dy, (14.8)

whereg(y) = S0 exp

σ√T0 y +

(r − 1

2σ2)T0

.

Since[C (g(y))−K0

]+is increasing in y,[

C (g(y))−K0

]+=[C (g(y))−K0

]I(y0,∞),

wherey0 := infy | C (g(y)) > K0.

Therefore,

V cc0 = e−rT0

∫ ∞y0

[g(y)Φ

(d1(y)

)−Ke−r(T−T0)Φ

(d2(y)

)]ϕ(y) dy

− e−rT0K0Φ(−y0),

where

d1(y) := d1

(g(y)

)=

ln (S0/K) + σ√T0 y + rT + σ2(T − 2T0)/2

σ√T − T0

and

d2(y) := d2

(g(y)

)=

ln (S0/K) + σ√T0 y + (r − σ2/2)T

σ√T − T0

.

14.5 Path-Dependent Derivatives

Recall that a path-dependent derivative is a contract whose payo dependsnot just on the value of the underlying at maturity but on the entire historyof the asset over the duration of the contract. Because of this dependency, thevaluation of path-dependent derivatives is more complex than that of path-independent derivatives.

In this section we consider the most common path-dependent derivatives:barrier options, lookback options, and Asian options.

Other Options 179

14.5.1 Barrier Options

The payo for a barrier option depends on whether the value of the assethas crossed a predetermined level, called a barrier. Because of this added con-dition, barrier options are generally cheaper than standard options. They areuseful because they allow the holder to forego paying a premium for scenariosdeemed unlikely, while still retaining the essential features of a standard op-tion. For example, if an investor believes that a stock will not fall below $20,he could buy a barrier call option on the stock with payo (ST −K)+ if thestock remains above $20, and zero otherwise.

The payo for a barrier option has the form (ST − K)+IA if the optionis a call and (K − ST )+IA if the option is a put, where A is a barrier event.The indicator function acts as a switch, activating or deactivating the optionif the barrier is breached. Barrier events are typically described in terms ofthe random variables MS and mS , where for a process X

MX := maxXt | 0 ≤ t ≤ T and mX := minXt | 0 ≤ t ≤ T.

The most common barrier events are1

MS ≤ c : up-and-out option; deactivated if asset rises above c;mS ≥ c : down-and-out option; deactivated if asset falls below c;MS ≥ c : up-and-in option; activated if asset rises above c;mS ≤ c : down-and-in option; activated if asset falls below c.

For the rst two cases, the so-called knock-out cases, the barrier is set sothat the option is initially active, while for the knock-in cases, the option isinitially inactive. For example, in the case of an up-and-out option, S0 < c,while for a down-and-in option S0 > c.

In this section, we show that the price Cdo0 of a down-and-out call optionis given by the formula

Cdo0 = S0

[Φ(d1)−

(c

S0

) 2rσ2

+1

Φ(δ1)

]−Ke−rT

[Φ(d2)−

(c

S0

) 2rσ2−1

Φ(δ2)

],

(14.9)where, with M := max(K, c),

d1,2 =ln(S0

M

)+ (r ± 1

2σ2)T

σ√T

and δ1,2 =ln(

c2

S0M

)+ (r ± 1

2σ2)T

σ√T

.

(14.10)

To establish (14.9), note rst that the payo for a down-and-out call is

CdoT = (ST −K)+IA, where A = mS ≥ c.1Strict inequalities may be used here, as well.


Since the option is out of the money if ST < K, the payo may be written

CdoT = (ST −K)IB , where B = ST ≥ K,mS ≥ c.

By risk-neutral pricing, the cost of the option is therefore

Cdo0 = e−rTE∗[(ST −K)IB

]. (14.11)

By Lemma 13.2.2, the value of the underlying at time t may be expressed as

St = S0eσ(W∗t +βt), β :=

r

σ− σ

2, 0 ≤ t ≤ T, (14.12)

where W ∗t is a P∗-Brownian motion. By Girsanov's Theorem, Wt := W ∗t + βtis a Brownian motion under the probability measure P given by dP = ZT dP∗,where

ZT := e−βW∗T− 1

2β2T = e−βWT+ 1

2β2T .

Since ST = S0eσWT and mS = S0e

σmW , we may express B as

B = WT ≥ a,mW ≥ b, a := σ−1 ln (K/S0), b := σ−1 ln (c/S0). (14.13)

Note that the barrier is set so that b < 0. Since

STZ−1T = S0e

γWT− 12β

2T , γ := σ + β =r

σ+σ

2,

Equation (14.11) and the change of measure formula yield

Cdo0 = e−rT E[(ST −K)IBZ−1

T

]= e−(r+β2/2)T

[S0E

(eγWT IB

)−KE

(eβWT IB

)]. (14.14)

It remains then to evaluate E(eλWT IB

)for λ = γ and β. For this we shall

need the following lemma, which we state without proof.2

Lemma 14.5.1. The joint density fm(x, y) of (WT ,mW ) under P is given by

fm(x, y) = gm(x, y)IE(x, y), where (14.15)

gm(x, y) =2(x− 2y)

T√

2πTexp

[−(x− 2y)2

2T

]and

E = (x, y) | y ≤ 0, y ≤ x.

2The derivation of the density formula (14.15) is based on the reection principle ofBrownian motion, which asserts that the rst time the process hits a specied nonzero levell it starts anew, and its probabilistic behavior thereafter is invariant under reection in thehorizontal line through l. For a detailed account, the reader is referred to [3] or [17].

Other Options 181

From the lemma and (14.13), we see that for any real number λ,

E(eλWT IB

)= E

(eλWT I[a,∞)×[b,∞)(WT ,m

W ))

=

∫∫D

eλxgm(x, y) dA, (14.16)

where

D = E ∩([a,∞)× [b,∞)

)= (x, y) | b ≤ y ≤ 0, x ≥ a, x ≥ y.

The integral in (14.16) depends on the relative values of K and c and also ofK and S0. To facilitate its evaluation, we prepare the following lemma.

Lemma 14.5.2. For any real number λ and extended real numbers xj andyj, ∫ x2

x1

∫ y2

y1

eλxgm(x, y) dy dx = Iλ(y2;x1, x2)− Iλ(y1;x1, x2),

where, for y = y1 or y2 and integration variable x1 < x < x2,

Iλ(y;x1, x2) =

e2yλ+λ2T/2

Φ(

2y−x1+λT√T

)− Φ

(2y−x2+λT√

T

), y 6= x real

Iλ(0;x1, x2), y = x

0, y = ±∞.

Proof. A simple substitution yields∫ y2

y1

gm(x, y) dy =1√2πT

[eu(x,y2) − eu(x,y1)

],

where

u(x, y) =

−(x−2y√

2T

)2

, y 6= x real

u(x, 0), y = x

−∞, y = ±∞.Thus, ∫ x2

x1

∫ y2

y1

eλxgm(x, y) dy dx = Jλ(y2;x1, x2)− Jλ(y1;x1, x2),

where, for y = y1 or y2 and integration variable x1 < x < x2,

Jλ(y;x1, x2) =

1√2πT

∫ x2

x1eλx+u(x,y) dx, y real and y 6= x

Jλ(0;x1, x2), y = x

0, y = ±∞.


It remains to show that Jλ(y;x1, x2) = Iλ(y;x1, x2) if y is real and y 6= x.Since

λx+ u(x, y) = λx− x2 − 4xy + 4y2

2T= − x

2

2T− y2

T+

(λ+

2y

T

)x,

Jλ(y;x1, x2) =e−2y2/T

√2πT

∫ x2

x1

e−x2/(2T )+(λ+2y/T )x dx

= e2yλ+λ2T/2

Φ

(2y − x1 + λT√

T

)− Φ

(2y − x2 + λT√

T

),

where for the last equality we used Exercise 11.12 with p = 1/(2T ) and q =λ+ 2y/T . Thus, Jλ = Iλ, completing the proof.

It is now a straightforward matter to evaluate (14.16). Suppose rst thatK > c, so that a > b. If K ≥ S0, then a ≥ 0 and D = [a,∞)× [b, 0] hence, byLemma 14.5.2,

E(eλWT IB

)=

∫ ∞a

∫ 0

b

eλxgm(x, y) dy dx

= Iλ(0; , a,∞)− Iλ(b; a,∞)

= eλ2T/2

Φ

(−a+ λT√T

)− e2bλΦ

(2b− a+ λT√

T

). (14.17)

On the other hand, if K < S0 then a < 0 and

D = (x, y) | a ≤ x ≤ 0, b ≤ y ≤ x ∪([0,∞)× [b, 0]

)(Figure 14.1) so, again by Lemma 14.5.2,

D

a

b

y

x

FIGURE 14.1: D for the case c < K < S0.

Other Options 183

E(eλWT IB

)=

∫ 0

a

∫ x

b

eλxgm(x, y) dy dx+

∫ ∞0

∫ 0

b

eλxgm(x, y) dy dx

= Iλ(0; a, 0)− Iλ(b; a, 0) + Iλ(0; 0,∞)− Iλ(b; 0,∞)

= eλ2T/2

Φ

(−a+ λT√T

)− Φ

(λT√T

)− e2bλ+λ2T/2

Φ

(2b− a+ λT√

T

)− Φ

(2b+ λT√

T

)+ eλ

2T/2Φ

(λT√T

)− e2bλ+λ2T/2Φ

(2b+ λT√

T

).

The last expression reduces to (14.17), which therefore holds for all values ofS0 and K with K > c.

We are now ready to evaluate Cdo0 for the case K > c. Taking λ = γ andβ in (14.17), we see from (14.14) that

Cdo0 = e(γ2−β2−2r)T/2S0

Φ

(−a+ γT√T

)− e2bγΦ

(2b− a+ γT√

T

)−Ke−rT

Φ

(−a+ βT√T

)− e2bβΦ

(2b− a+ βT√

T

). (14.18)

Recalling that

a = σ−1 ln (K/S0), b = σ−1 ln (c/S0), β =r

σ− σ

2, and γ =

r

σ+σ

2,

we have

γ2 − β2 = 2r, e2bγ =

(c

S0

) 2rσ2

+1

, and e2bβ =

(c

S0

) 2rσ2−1

.

Furthermore, one readily checks that

−a+ γT√T

=ln(S0/K) + (r + σ2/2)T

σ√T

= d1,

−a+ βT√T

=ln(S0/K) + (r − σ2/2)T

σ√T

= d2,

2b− a+ γT√T

=ln(c2/S0K) + (r + σ2/2)T

σ√T

= δ1,

2b− a+ βT√T

=ln(c2/S0K) + (r − σ2/2)T

σ√T

= δ2.

Inserting these expressions into (14.18) establishes (14.9) for the case K > c(M = K).

Now suppose K ≤ c. Then a ≤ b hence D = (x, y) | b ≤ y ≤ 0, x ≥ y(Figure 14.2) and


b

b

D

y

x

FIGURE 14.2: D for the case K ≤ c.

E(eλWT IB

)=

∫ 0

b

∫ x

b

eλxgm(x, y) dy dx+

∫ ∞0

∫ 0

b

eλxgm(x, y) dy dx

= Iλ(0; b, 0)− Iλ(b; b, 0) + Iλ(0; 0,∞)− Iλ(b; 0,∞)

= eλ2T/2

Φ

(−b+ λT√T

)− e2bλΦ

(b+ λT√

T

). (14.19)

Thus, from (14.14),

Cdo0 = e(γ2−β2−2r)T/2S0

Φ

(−b+ γT√T

)− e2bγΦ

(b+ γT√

T

)−e−rTK

Φ

(−b+ βT√T

)− e2bβΦ

(b+ βT√

T

). (14.20)

Since−b+ γT√

T=

ln(S0/c) + (r + σ2/2)T

σ√T

= d1,

b+ γT√T

=ln(c/S0) + (r + σ2/2)T

σ√T

= δ1,

−b+ βT√T

=ln(S0/c) + (r − σ2/2)T

σ√T

= d2,

b+ βT√T

=ln(c/S0) + (r − σ2/2)T

σ√T

= δ2,

we see that (14.9) holds for the case K ≤ c (M = c), as well. This completesthe derivation of (14.9).

We remark that the barrier level c for a down-and-out call is usually setto a value less than K; otherwise, the option could be knocked out even if itexpires in the money.

Example 14.5.3. Table 14.1 gives prices Cdo0 of down-and-out call optionsbased on a stock that sells for S0 = $50.00. The parameters are T = .5, r = .10

Other Options 185

and σ = .20. The cost of the corresponding standard call is $7.64 for K = 45and $1.87 for K = 55. Notice that the price of the barrier option decreases as

c 39 42 45 47 49 49.99Cdo0 $7.64 $7.54 $6.74 $5.15 $2.16 $0.02K = 45

c 42 43 45 47 49 49.99Cdo0 $1.87 $1.86 $1.81 $1.57 $0.78 $0.01K = 55

TABLE 14.1: Variation of Cdo0 with the barrier level c.

the barrier level increases. This is to be expected, since the higher the barrierthe more likely the option will be knocked out hence the less attractive theoption.

14.5.2 Lookback Options

A lookback option is another example of a path-dependent option, thepayo in this case depending on the maximum or minimum value of the assetover the contract period. There are two main categories of lookback options,oating strike and xed strike. The holder of a oating strike lookback calloption has the right at maturity to buy the stock for its lowest value overthe duration of the contract, while the holder of a oating strike lookbackput option may sell the stock at its high. The payos of lookback call andput options are, respectively, ST −mS and MS − ST , where mS and MS aredened as in Subsection 14.5.1. In the present subsection, we determine thevalue Vt of a oating strike lookback call option. Fixed strike lookback optionsare examined in Exercise 8.

By risk-neutral pricing,

Vt = e−r(T−t)E∗(ST−mS |Ft) = St−e−r(T−t)E∗(mS |Ft), 0 ≤ t ≤ T, (14.21)

where we have used the fact that the discounted asset price is a P∗-martingale.As in Subsection 14.5.1, the value of the underlying at time tmay be expressedas

St = S0eσWt , Wt := W ∗t + βt, β :=

r

σ− σ

2, 0 ≤ t ≤ T,

where W ∗t is a P∗-Brownian motion.To evaluate E∗

(mS |Ft

), we introduce the following additional notation:

For a process X and for t ∈ [0, T ], set

mXt := minXu | 0 ≤ u ≤ t and mX

t,T := minXu | t ≤ u ≤ T.


Thus, in our earlier notation, mX = mXT . Now let t < T and set

Yt = eσminWu−Wt|t≤u≤T.

Since Su = Steσ(Wu−Wt),

mSt,T = minSteσ(Wu−Wt) | t ≤ u ≤ T = StYt,

and thereforemS = min(mS

t ,mSt,T ) = min(mS

t , StYt).

Since mSt is Ft-measurable and Yt is independent of Ft, we can apply Re-

mark 13.2.6 with g(x1, x2, y) = min(x1, x2y) to conclude that

E∗(mS |Ft) = E∗(g(mSt , St, Yt)|Ft) = Gt(m

St , St),

where

Gt(m, s) = E∗g(m, s, Yt) = E∗min(m, sYt), 0 < m ≤ s. (14.22)

From (14.21) we see that Vt may now be expressed as

Vt = vt(mSt , St), where vt(m, s) = s− e−r(T−t)Gt(m, s). (14.23)

The remainder of the section is devoted to evaluating Gt.To this end, x t, m and s with 0 < m ≤ s and set

A = mWτ ≤ a, τ := T − t, a :=

1

σln(ms

).

Since Wu − Wt = W ∗u −W ∗t + β(u− t) and Wu−t = W ∗u−t + β(u− t), we seethat Wu − Wt and Wu−t have the same distribution under P∗. Therefore Ytand eσm

Wτ have the same distribution under P∗. It follows that

Gt(m, s)−m = E∗[min

(m, seσm

Wτ

)−m

]= E∗

[min

(0, seσm

Wτ −m

)].

Noting that min(

0, seσmWτ −m

)= seσm

Wτ −m i mW

τ ≤ a, we see that

Gt(m, s) = E∗[(seσm

Wτ −m)IA

]+m

= sE∗(eσm

Wτ IA

)+m (1− E∗ IA) . (14.24)

It remains then to evaluate E∗(eσm

Wτ IA

)and E∗ (IA). For this, we shall need

the following lemmas.

Other Options 187

Lemma 14.5.4. The joint density fm(x, y) of (Wτ ,mWτ ) under P∗ is given

by

fm(x, y) = eβx−12β

2τgm(x, y)IE(x, y), where

gm(x, y) =2(x− 2y)

τ√

2πτexp

− (x− 2y)2

2τ

and

E = (x, y) | y ≤ 0, y ≤ x.

Proof. By Girsanov's Theorem, (Wu)0≤u≤τ is a Brownian motion under the

probability measure P given by

dP = Zτ dP∗, Zτ = e−βWτ+ 12β

2τ .

By Lemma 14.5.1 with T replaced by τ , the joint density of (Wτ ,mWτ ) under P

is gm(x, y)IE(x, y), where gm and E are dened as above. The cdf of (Wτ ,mWτ )

under P∗ is therefore

P∗(Wτ ≤ x,mW

τ ≤ y)

= E∗(IWτ≤x,mWτ ≤y

)= E

(IWτ≤x,mWτ ≤y

Z−1τ

)= E

(IWτ≤x,mWτ ≤y

eβWτ− 12β

2τ)

=

∫ x

−∞

∫ y

−∞eβu−

12β

2τgm(u, v)IE(u, v) dv du,

verifying the lemma.

Lemma 14.5.5. The density fm of mWτ under P∗ is given by

fm(z) = gm(z)I(−∞,0](z), where

gm(z) =2√τϕ

(z − βτ√

τ

)+ 2βe2βzΦ

(z + βτ√

τ

).

Proof. By Lemma 14.5.4, the cdf of mWτ under P∗ is

P∗(mWτ ≤ z

)=

∫ ∞−∞

∫ z

−∞fm(x, y) dy dx

= e−β2τ/2

∫∫D

eβxgm(x, y) dA,

where D = (x, y) | y ≤ min(0, x, z). Suppose rst that z ≤ 0. The integralover D may then be expressed as I ′ + I ′′, where

I ′ =

∫ ∞z

∫ z

−∞eβxgm(x, y) dy dx and I ′′ =

∫ z

−∞

∫ x

−∞eβxgm(x, y) dy dx


y

x

D

z

z

FIGURE 14.3: D = (x, y) | y ≤ minx, z, z ≤ 0.

(Figure 14.3). By Lemma 14.5.2 (with T replaced by τ),

I ′ = Iβ(z; z,∞)− Iβ(−∞; z,∞) = e2zβ+β2τ/2Φ

(z + βτ√

τ

)and

I ′′ = Iβ(0;−∞, z)− Iβ(−∞;−∞, z) = eβ2τ/2Φ

(z − βτ√

τ

).

Thus, if z ≤ 0,

P∗(mW ≤ z

)= e2βzΦ

(z + βτ√

τ

)+ Φ

(z − βτ√

τ

).

Dierentiating the expression on the right with respect to z and using theidentity

e2zβϕ

(z + βτ√

τ

)= ϕ

(z − βτ√

τ

)produces gm(z). Since P∗

(mW ≤ z

)= P∗

(mW ≤ 0

)for z > 0, the conclu-

sion of the lemma follows.

We are now in a position to evaluate E∗(eλm

Wτ IA

). By Lemma 14.5.5

(noting that a ≤ 0), we have

E∗(eλm

Wτ IA

)=

∫ a

−∞eλzgm(z) dz =

2√τJ ′λ + 2βJ ′′λ , (14.25)

where

J ′λ =

∫ a

−∞eλzϕ

(z − βτ√

τ

)dz =

e−β2τ/2

√2π

∫ a

−∞e−z

2/(2τ)+(β+λ)z dz and

J ′′λ =

∫ a

−∞e(λ+2β)zΦ

(z + βτ√

τ

)dz =

1√2π

∫ a

−∞e(λ+2β)z

∫ z+βτ√τ

−∞e−x

2/2 dx dz.

Other Options 189

By Exercise 11.12,

J ′λ =√τeλβτ+λ2τ/2Φ

(a− (λ+ β)τ√

τ

). (14.26)

To evaluate J ′′λ , we reverse the order of integration: For λ 6= −2β,

J ′′λ =1√2π

∫ b

−∞e−x

2/2

∫ a

√τx−βτ

e(λ+2β)z dz dx, b :=a+ βτ√

τ

=1

(λ+ 2β)√

2π

∫ b

−∞e−x

2/2(e(λ+2β)a − e(λ+2β)(

√τx−βτ)

)dx

=1

(λ+ 2β)

[e(λ+2β)aΦ(b)− eλβτ+λ2τ/2Φ

(b− (λ+ 2β)

√τ)], (14.27)

where to obtain the last equality we used Exercise 11.12 again.From (14.26) and (14.27),

J ′σ =√τeσβτ+σ2τ/2Φ

(a− (σ + β)τ√

τ

)and

J ′′σ =1

(σ + 2β)

[e(σ+2β)aΦ(b)− eσβτ+σ2τ/2Φ

(b− (σ + 2β)

√τ)].

Recalling that

a =1

σln(ms

), β =

r

σ− σ

2, and b =

a+ βτ√τ

=1

σ√τ

ln(ms

)+ β√τ ,

we see that

σβτ +σ2τ

2= τr, (σ + 2β)a =

2r

σ2ln(ms

)and

a− (σ + β)τ√τ

= b− (σ + 2β)√τ =

1√τ

[1

σln(ms

)−( rσ

+σ

2

)τ

].

Therefore, setting

δ1,2 = δ1,2(τ,m, s) =ln(s/m) + (r ± σ2/2)τ

σ√τ

and

d = d(τ,m, s) =ln(m/s) + (r − σ2/2)τ

σ√τ

,

we have

J ′σ =√τerτΦ(−δ1) and J ′′σ =

σ

2r

[(ms

)2r/σ2

Φ(d)− erτΦ(−δ1)

].


From (14.25) then

E∗(eσm

Wτ IA

)=

2√τJ ′σ + 2βJ ′′σ

= 2erτΦ(−δ1) +

(1− σ2

2r

)[(ms

)2r/σ2

Φ(d)− erτΦ(−δ1)

]= erτ

(1 +

σ2

2r

)Φ(−δ1) +

(1− σ2

2r

)(ms

)2r/σ2

Φ(d). (14.28)

Similarly, if β 6= 0,

J ′0 =√τΦ

(a− βτ√

τ

)=√τΦ(−δ2) and

J ′′0 =1

2β

[e2βaΦ(b)− Φ

(b− 2β

√τ)]

=1

2β

[(ms

) 2rσ2−1

Φ(d)− Φ(−δ2)

],

hence

E∗(IA) =2√τJ ′0 + 2βJ ′′0

= 2Φ(−δ2) +

[(ms

) 2rσ2−1

Φ(d)− Φ(−δ2)

]= Φ(−δ2) +

(ms

) 2rσ2−1

Φ(d). (14.29)

The reader may verify that (14.29) also holds if β = 0. From (14.24), (14.28),and (14.29),

Gt(m, s) = s

erτ(

1 +σ2

2r

)Φ(− δ1

)+

(1− σ2

2r

)(ms

) 2rσ2

Φ(d)

+m

1− Φ

(− δ2

)−(ms

) 2rσ2−1

Φ(d)

= serτ(

1 +σ2

2r

)Φ(− δ1(τ,m, s)

)+mΦ

(δ2(τ,m, s)

)− sσ2

2r

(ms

) 2rσ2

Φ(d(τ,m, s)

). (14.30)

Finally, recalling (14.23), we have Vt = v(mSt , St), where

vt(m, s) = s− e−rτGt(m, s)

= sΦ(δ1(m, s, τ)

)−me−rτΦ

(δ2(m, s, τ)

)− sσ2

2rΦ(− δ1(m, s, τ)

)+ e−rτ

sσ2

2r

(ms

) 2rσ2

Φ(d(m, s, τ)

).

Other Options 191

14.5.3 Asian Options

An Asian or average option has payo that depends on an average A(S)of the price process S of the underlying asset. The most common types ofAsian options are

the xed strike average call with payo (A(S)−K)+,

the oating strike average call with payo (ST −A(S))+,

the xed strike average put with payo (K −A(S))+, and

the oating strike average put with payo (A(S)− ST )+.

The average A(S) is typically one of the following, where the discrete timestj satisfy 0 ≤ t1 < t2 < · · · < tn ≤ T :

discrete arithmetic average: A(S) =1

n

n∑j=1

Stj ,

continuous arithmetic average: A(S) =1

T

∫ T

0

St dt,

discrete geometric average: A(S) =

n∏j=1

Stj

1/n

,

continuous geometric average: A(S) = exp

(1

T

∫ T

0

lnSt dt

).

In the continuous case, averaging intervals can be of the more general form[T0, T ].

Asian options are usually less expensive than standard options and havethe advantage of being less sensitive to manipulation of the underlying as-set price on a particular day, that eect mitigated by the averaging process.They are also useful as hedges for an investment plan consisting of a series ofpurchases over time of a commodity with changing price.

The xed strike geometric average option readily lends itself to Black-Scholes-Merton risk-neutral pricing, as the following theorems illustrate.

Theorem 14.5.6. The cost V0 of a xed strike continuous geometric averagecall option is

V0 = S0e−rT/2−σ2T/12Φ (d1)−Ke−rTΦ (d2) , (14.31)

where

d2 :=ln (S0

K ) + (r − 12σ

2)T2

σ√

T3

, d1 := d2 + σ

√T

3.


Proof. By risk-neutral pricing,

V0 = e−rTE∗ [A(S)−K]+, where A(S) = exp

(1

T

∫ T

0

lnSt dt

).

From Lemma 13.2.2,

lnSt = lnS0 + σW ∗t +

(r − σ2

2

)t

hence1

T

∫ T

0

lnSt dt = lnS0 +σ

T

∫ T

0

W ∗t dt+

(r − σ2

2

)T

2.

By Example 10.6.1,∫ T

0W ∗t dt is normal under P∗ with mean zero and variance

T 3/3. Therefore

A(S) = S0 exp

σ

√T

3Z +

(r − σ2

2

)T

2

,

where Z ∼ N(0, 1). It follows that

erTV0 =

∫ ∞−∞

(S0 exp

σ

√T

3z +

(r − σ2

2

)T

2

−K

)+

ϕ(z) dz.

Since the integrand is zero when z < −d2,

erTV0 = S0

∫ ∞−d2

exp

σ

√T

3z +

(r − σ2

2

)T

2

ϕ(z) dz −K

∫ ∞−d2

ϕ(z) dz

= S0e(r−σ2/2)T/2 1√

2π

∫ ∞−d2

exp

σ

√T

3z − 1

2z2

dz −KΦ (d2)

= S0erT/2−σ2T/12Φ (d1)−KΦ (d2) ,

the last equality by Exercise 11.12.

To price the discrete geometric average call option, we rst establish thefollowing lemma.

Lemma 14.5.7. Let t0 := 0 < t1 < t2 < · · · < tn ≤ T . The joint density of(W ∗t1 ,W

∗t2 , . . . ,W

∗tn) under P∗ is given by

f(x1, x2, . . . , xn) =

n∏j=1

fj(xj − xj−1),

where x0 = 0 and fj is the density of W ∗tj −W ∗tj−1:

fj(x) =1√

2π(tj − tj−1)ϕ

(x√

tj − tj−1

).

Other Options 193

Proof. SetA = (−∞, z1]× (−∞, z2]× · · · × (−∞, zn]

and

B = (y1, y2, . . . , yn) | (y1, y1 + y2, . . . , y1 + y2 + · · ·+ yn) ∈ A.

By independent increments,

P∗((W ∗t1 ,W

∗t2 , . . . ,W

∗tn) ∈ A

)= P∗

((W ∗t1 ,W

∗t2 −W ∗t1 , . . . ,W ∗tn −W ∗tn−1

) ∈ B)

=

∫B

f1(y1)f2(y2) · · · fn(yn) dy.

With the substitution xj = y1 + y2 + · · ·+ yj , we obtain

P∗((W ∗t1 ,W

∗t2 , . . . ,W

∗tn) ∈ A

)=

∫A

f1(x1)f2(x2 − x1) · · · fn(xn − xn−1) dx,

which establishes the lemma.

Corollary 14.5.8. W ∗t1 +W ∗t2 + · · ·+W ∗tn ∼ N(0, σ2n) under P∗, where

σ2n := 12τn + 22τn−1 + · · ·+ n2τ1, τj := tj − tj−1.

Proof. By Lemma 14.5.7,

E∗[eλ(W∗t1

+W∗t2+···W∗tn )

]=

∫ ∞−∞· · ·∫ ∞−∞

eλ(x1+···+xn)f(x1, . . . , xn) dx

=

∫ ∞−∞· · ·∫ ∞−∞

eλ(x1+x2+···+xn−1)f1(x1) · · · fn−1(xn−1 − xn−2)∫ ∞−∞

eλxnfn(xn − xn−1) dxn dxn−1 · · · dx1.

The innermost integral evaluates to

1√2πτn

∫ ∞−∞

eλxn−(xn−xn−1)2/(2τn) dxn =eλxn−1

√2πτn

∫ ∞−∞

eλxn−x2n/(2τn) dxn

= eλxn−1+τnλ2/2.

Therefore,

E∗[eλ(W∗t1

+W∗t2+···+W∗tn )

]= eτnλ

2/2

∫ ∞−∞· · ·∫ ∞−∞

eλ(x1+···+xn−2)f1(x1) · · · fn−2(xn−2 − xn−3)∫ ∞−∞

e2λxn−1fn−1(xn−1 − xn−2) dxn−1 dxn−2 · · · dx1.


Repeating the argument, we arrive at

E∗[eλ(W∗t1

+W∗t2+···+W∗tn )

]= eσ

2nλ

2/2.

The corollary now follows from Theorem 12.4.3 and Example 12.4.2.

Theorem 14.5.9. The cost of a xed strike discrete geometric average calloption with payo (A(S)−K)+, where

A(S) :=

(n∏0

Stj

)1/(n+1)

, tj :=jT

n, j = 0, 1, . . . , n,

is given by

V(n)0 = S0 exp

−(r +

σ2

2

)T

2+σ2Tan

2

Φ(σ√anT + dn

)−KΦ (dn) ,

where

dn :=ln (S0/K) + 1

2 (r − σ2/2)T

σ√anT

, an :=2n+ 1

6(n+ 1).

Proof. By Corollary 14.5.8,

A(S) = S0 exp

σ(n+ 1)−1n∑j=1

W ∗tj + (r − 12σ

2)(n+ 1)−1n∑j=1

tj

= S0 exp

σσn(n+ 1)−1Z +

(r − 1

2σ2)t,

where Z ∼ N(0, 1),

σ2n =

T

n

n∑j=1

j2 =(n+ 1)(2n+ 1)T

6= (n+ 1)2anT, and

t =1

n+ 1

n∑j=1

tj =T

2.

Since σn(n+ 1)−1 =√anT , risk-neutral pricing implies that

erTV(n)0 = E∗ (A(S)−K)+

=

∫ ∞−∞

(S0 exp

σ√anTz +

(r − σ2

2

)T

2

−K

)+

ϕ(z) dz

= S0e

(r−σ22

)T2

1√2π

∫ ∞−dn

eσ√anTz− 1

2 z2

dz −K∫ ∞−dn

ϕ(z) dz

= S0e

(r−σ22

)T2 + 1

2σ2anTΦ

(σ√anT + dn

)−KΦ(dn),

the last equality by Exercise 11.12.

Other Options 195

Arguments similar to those of Theorems 14.5.6 and 14.5.9 may be used toestablish formulas for the value of the options at arbitrary time t ∈ [0, T ] (see[10]).

In the arithmetic case, A(S) does not have a lognormal distribution, hencearithmetic average options are more dicult to price. Indeed, there is noknown closed form pricing formula as in the geometric average case. Tech-niques used to price arithmetic average options typically involve approxima-tion, Monte Carlo simulation, or partial dierential equations. Accounts ofthese approaches along with references may be found in [1, 10, 12, 17].

14.6 Quantos

A quanto is a derivative with underlying asset denominated in one currencyand payo denominated in another. Consider the case of a foreign stock withprice process Se denominated in euros. Let Q denote the exchange rate indollars per euro, so that the stock has dollar price process S := SeQ. Astandard call option on the foreign stock has payo

(SeT −K)+ euros = (ST −KQT )+ dollars,

but this value might be adversely aected by the exchange rate. Instead,a domestic investor could buy a quanto call option with payo (SeT − K)+

dollars. Here, the strike price K is in dollars and SeT = STQ−1T is interpreted

as a dollar value (e.g., SeT = 5 euros is interpreted as 5 dollars).More generally, consider a claim with dollar value H = f(SeT ), where

f(x) is continuous and EH2 <∞. The methods of Chapter 13 may be easilyadapted to nd the fair price of such a claim. Our point of view is that of a do-mestic investor, that is, one who constantly computes investment transactionsin dollar units.

To begin, assume that S and Q are geometric Brownian motion processes,say

St = S0eσ1W1(t)+(µ1−σ2

1/2)t and Qt = Q0eσ2W2(t)+(µ2−σ2

2/2)t,

where σ1, σ2, µ1, and µ2 are constants and W1 and W2 are Brownian motionson (Ω,F ,P). For ease of exposition, we shall take W1 and W2 to be indepen-dent processes.3 As in Section 14.1, Dt = erdt and Et = eret denote the priceprocesses of a dollar bond and a euro bond, respectively, where rd and re arethe dollar and euro interest rates. Set

α1 :=µ1 − rdσ1

and α2 :=µ2 + re − rd

σ2.

3A more realistic model assumes that the processes are correlated, that is, W2 = %W1 +√1− %2W3, where W1 and W3 are independent Brownian motions and 0 < |%| < 1. See,

for example, [17].


By the general Girsanov's Theorem (Remark 12.5.3), there exists a singleprobability measure P∗ on Ω (the so-called domestic risk-neutral probabilitymeasure) relative to which the processes

W ∗1 (t) := W1(t) + α1t and W ∗2 (t) := W2(t) + α2t, 0 ≤ t ≤ T,

are independent Brownian motions with respect to the ltration (Gt) generatedby W1 and W2. In terms of W ∗1 and W ∗2 , the process Se = SQ−1 may bewritten

Set = Se0 exp

σ1W1(t) +

(µ1 −

σ21

2

)t− σ2W2(t)−

(µ2 −

σ22

2

)t

= Se0 exp

σ1W

∗1 (t)− σ2W

∗2 (t) +

(re −

σ21

2+σ2

2

2

)t

= Se0 exp

σW ∗(t) +

(re + σ2

2 −σ2

2

)t

, (14.32)

whereσ := (σ2

1 + σ22)1/2 and W ∗ := σ−1 (σ1W

∗1 − σ2W

∗2 ) .

SinceW ∗1 andW ∗2 are independent,W ∗ is easily seen to be a Brownian motionwith respect to (Gt). By the continuous parameter version of Theorem 8.4.6,W ∗ is also a Brownian motion with respect to the Brownian ltration (FW∗t ).

Now let H be a claim of the form f(SeT ) and let (φ, θ) be a self-nancing,replicating trading strategy for H based on the dollar bond and a risky assetwith dollar price process X given by

Xt := eζtSet = Se0 exp

σW ∗(t) +

(rd −

σ2

2

)t

, ζ := rd − re − σ2

2 .

Let V = φD + θX denote the value process of the portfolio and set f1(x) =f(e−ζTx

), so that H = f1(XT ). Note that the processes D−1X and D−1V

are P∗-martingales. By Corollary 13.1.3, the value of the claim at time t is

Vt = e−rd(T−t)E∗(H|FW∗t ) = e−rd(T−t)G1(t,Xt),

where

G1(t, s) :=

∫ ∞−∞

f1

(s exp

σ√T − t y + (rd − σ2/2)(T − t)

)ϕ(y) dy.

Dene G(t, s) := G1

(t, eζts

), so that G1(t,Xt) = G(t, Set ). Replacing s by

eζts in the denition of G1(t, s), we arrive at the following result:

Theorem 14.6.1. Let H = f(SeT ), where f is continuous and EH2 < ∞.Then the dollar value of H at time t is Vt = e−rd(T−t)G(t, Set ), where

G(t, s) :=

∫ ∞−∞

f(s exp

σ√T − t y + (re + σ2

2 − σ2/2)(T − t))

ϕ(y) dy.

Other Options 197

Example 14.6.2. (Quanto Call Option). Taking f(x) = (x−K)+ in Theo-rem 14.6.1 and using (11.13) with r replaced by re + σ2

2 , we obtain

G(t, s) = se(re+σ22)(T−t)Φ

(d1(s, T − t)

)−KΦ

(d2(s, T − t)

),

where

d1,2(s, τ) =ln (s/K) + (re + σ2

2 ± σ2/2)τ

σ√τ

.

The dollar value of a quanto call option at time t is therefore

Vt = e−(rd−re−σ22)(T−t)SetΦ

(d1(Set , T − t)


(d2(Set , T − t)

).

14.7 Options on Dividend-Paying Stocks

In this section, we determine the price of a claim based on a dividend-paying stock. We consider two cases: continuous dividends and periodic divi-dends. We begin with the former, which is somewhat easier to model.

14.7.1 Continuous Dividend Stream

Assume our stock pays a dividend of δSt dt in the time interval from tto t + dt, where δ is a constant between 0 and 1. Since dividends reduce thevalue of the stock, the price process must be adjusted to reect this reduction.Therefore we have

dSt = σSt dWt + µSt dt− δSt dt = σSt dWt + (µ− δ)St dt. (14.33)

Now let (φ, θ) be a self-nancing trading strategy with value process V =φB+θS. The denition of self-nancing must be modied to take into accountthe dividend stream, as the change in V depends not only on changes in thestock and bond values but also on the dividend increment. Thus, we requirethat

dVt = φt dBt + θt dSt + δθtSt dt.

From (14.33),

dV = φdB + θ[σS dW + (µ− δ)S dt

]+ δθS dt

= φdB + θS(σ dW + µdt

). (14.34)

Now set

θt = e−δtθt and St = eδtSt = S0 exp(σWt + (µ− 1

2σ2)t).


Note that St is the price process of the stock without dividends (or with

dividends reinvested) and V = φB + θS is the value process of a trading

strategy (φ, θ) based on the bond and the non-dividend-paying version of our

stock. Since dS = S(σ dW +µdt), we see from (14.34) that dV = φdB+ θ dS,

that is, the trading strategy (φ, θ) is self-nancing. The results of Chapter 13therefore apply to S and we see that the value of a claim H is the same asthat for a stock without dividends, namely, e−r(T−t)E∗ (H|Ft), where P∗ isthe risk-neutral measure. The dierence here is that the discounted process Sis no longer a P∗-martingale, as seen from the representation of S as

St = e−δtSt = S0eσW∗t +(r−δ−σ2/2)t. (14.35)

Now suppose that H = f(ST ), where f(x) is continuous and E∗H2 < ∞.Set f1(x) = f

(e−δTx

), so that H = f1(ST ). By Corollary 13.1.3, the value of

the claim at time t is Vt = e−r(T−t)G1(t, St), where

G1(t, s) :=

∫ ∞−∞

f1

(s exp

σ√T − t y + (r − σ2/2)(T − t)

)ϕ(y) dy.

Dene G(t, s) = G1

(t, eδts

), so that G1(t, St) = G(t, St). Replacing s in the

denition of G1 by eδts, we obtain the following result:

Theorem 14.7.1. Let H = f(ST ), where f is continuous and EH2 < ∞.Then the value of H at time t is Vt = e−r(T−t)G(t, St), where

G(t, s) :=

∫ ∞−∞

f(s exp

σ√T − t y + (r − δ − σ2/2)(T − t)

)ϕ(y) dy.

Example 14.7.2. (Call option on a dividend-paying stock). Taking f(x) =(x −K)+ in Theorem 14.7.1 and using (11.13) with r replaced by r − δ, wehave

G(t, s) = se(r−δ)(T−t)Φ (d1(s, T − t))−KΦ (d2(s, T − t)) ,where

d1,2(s, τ) =ln (s/K) + (r − δ ± σ2/2)τ

σ√τ

.

The time-t value of a call option on a dividend-paying stock is therefore

Ct = e−δ(T−t)StΦ(d1(St, T − t)

)− e−r(T−t)KΦ

(d2(St, T − t)

).

14.7.2 Discrete Dividend Stream

Now suppose that our stock pays a dividend only at the discrete timestj , where 0 < t1 < t2 < · · · < tn < T . Set t0 = 0 and tn+1 = T . Betweendividends the stock price is assumed to follow the SDE dSt = σSt dWt+µSt dt.At each dividend-payment time, the stock value is reduced by the amount of

Other Options 199

the dividend, which we again assume is a fraction δ ∈ (0, 1) of the value ofthe stock. The price process is no longer continuous but has jumps at thedividend-payment times tj . This may be modeled by the equations

St = Stjeσ(Wt−Wtj )+(µ−σ2/2)(t−tj), tj ≤ t < tj+1, j = 0, 1, . . . , n,

Stj+1 = Stjeσ(W (tj+1)−Wtj )+(µ−σ2/2)(tj+1−tj)(1− δ), j = 0, 1, . . . , n− 1.

Setting St = S0eσWt+(µ−σ2/2)t we can rewrite these as

St =Stj St

Stj, tj ≤ t < tj+1, j = 0, 1, . . . , n,

Stj+1=Stj Stj+1

Stj(1− δ), j = 0, 1, . . . , n− 1. (14.36)

If j = n the rst formula also holds for t = tn+1(= T ). Let (φ, θ) be atrading strategy based on the stock and the bond. Between dividends, thevalue process is given by

Vt = φtBt + θtSt = φtBt + θtStj St

Stj, tj ≤ t < tj+1.

At time tj+1 the stock portion of the portfolio decreases in value by the amount

δθtjStj Stj+1

Stj,

but the cash portion increases by the same amount because of the dividend.Since the net change is zero, V is a continuous process. Moreover, assumingthat (φ, θ) is self-nancing between dividends, we have for tj ≤ t < tj+1

dVt = φt dBt + θt[σSt dWt + µSt dt

]=[rφtBt + µθtSt

]dt+ σθtSt dWt.

From (14.36), for 1 ≤ m ≤ n and tm ≤ t < tm+1, or m = n and t = T , wehave

StS0

=StStm

m∏j=1

StjStj−1

= (1− δ)m St

Stm

m∏j=1

Stj

Stj−1

= (1− δ)m StS0,

that is,

St = (1− δ)mSt, tm ≤ t < tm+1, and ST = (1− δ)nST .In particular, ST is the value at time T of a non-dividend-paying stock withthe geometric Brownian motion price process X = (1 − δ)nS. Therefore, wecan replicate a claim H = f(ST ) = f(XT ) with a self-nancing strategy basedon a stock with this price process and the original bond. By Corollary 13.1.3,the value of the claim at time t is

Vt = e−r(T−t)G (t,Xt) = e−r(T−t)G(t, (1− δ)n−mSt

), tm ≤ t < tm+1,

where G(t, s is given by (13.3).


14.8 American Claims in the BSM Model

Recall that the holder of an American claim has the right to exercise theclaim at any time t ≤ T . Many of the results concerning valuation of Americanclaims in the binomial model carry over to the BSM setting. The vericationsare more dicult, however, and require advanced techniques from martingaletheory. In this section, we give a brief overview of the main ideas, withoutproofs. For details the reader is referred to [8, 12, 13].

Assume the payo of a (path-independent) American claim at time t is ofthe form g(t, St). The holder will try to choose an exercise time that optimizeshis payo. The exercise time, being a function of the price of the underlying,is a random variable, the determination of which cannot rely on future infor-mation. As in the discrete case, such a random variable is called a stoppingtime, formally dened as a function τ on Ω with values in [0, T ] such that

τ ≤ t ∈ Ft, 0 ≤ t ≤ T.

Since τ > t−1/n = τ ≤ t−1/n′ ∈ Ft for any positive integer n it followsthat

τ = t =

∞⋂n=1

t− 1/n < τ ≤ t ∈ Ft, 0 ≤ t ≤ T.

Now let Tt,T denote the set of all stopping times with values in the interval[t, T ], 0 ≤ t ≤ T . If at time t the holder of the claim chooses the stopping timeτ ∈ Tt,T , her payo will be g(τ, Sτ ). The writer will therefore need a portfoliothat covers this payo. By risk-neutral pricing, the time-t value of the payois E∗

(e−r(τ−t)g(τ, Sτ )|Ft

). Thus, if the writer is to cover the claim for any

choice of stopping time, then the value of the portfolio at time t should be

Vt = maxτ∈Tt,T

E∗(e−r(τ−t)g(τ, Sτ )|Ft

).4 (14.37)

One can show that a trading strategy (φ, θ) exists with value process givenby (14.37). With this trading strategy, the writer may hedge the claim, so itis natural to take the value of the claim at time t to be Vt. In particular, thefair price of the claim is

V0 = maxτ∈T0,T

E∗(e−rτg(τ, Sτ )

). (14.38)

It may be shown that price V0 given by (14.38) guarantees that neitherthe holder nor the writer of the claim has an arbitrage opportunity. Moreprecisely,

4Since the conditional expectations in this equation are dened only up to a set ofprobability one, the maximum must be interpreted as the essential supremum, dened instandard texts on real analysis.

Other Options 201

it is not possible for the holder to initiate a self-nancing trading strategy(φ′, θ′) with initial value

V ′0 := φ′0 + θ′0S0 + V0 = 0

such that the terminal value of the portfolio obtained by exercising theclaim at some time τ and investing the proceeds in risk-free bonds,namely,

V ′T := er(T−τ) [φ′τerτ + θ′τSτ + g(τ, Sτ )] ,

is nonnegative and has a positive probability of being positive;

for any writer-initiated self-nancing trading (φ′, θ′) with initial value

V ′0 := φ′0 + θ′0S0 − V0 = 0,

there exists a stopping time τ for which it is not the case that theterminal value of the portfolio resulting from the holder exercising theclaim at time τ , namely,

V ′T := er(T−τ) (φ′τerτ + θ′τSτ − g(τ, Sτ )) ,

is nonnegative and has a positive probability of being positive.

Finally, it may be shown that after time t the optimal time for the holderto exercise the claim is

τt = infu ∈ [t, T ] | g(u, Su) = Vu.

Explicit formulas are available in the special case of an American put, forwhich g(t, s) = (K − s)+.

Further Directions

Because of the limited scope of the text we have described only a few ofthe many intricate options available in the market. Omissions include

call or put options based on stocks with jumps, thus incorporating intothe model the realistic possibility of market shock;

stock index option, where the underlying is a weighted average of stockswith interrelated price processes;

exchange option, giving the holder the right to exchange one risky assetfor another;


basket option, the payo a weighted average of a group of underlyingassets;

Bermuda option, similar to an American option but with a nite set ofprescribed exercise dates.

Russian option, with payo the discounted maximum value of the un-derlying up to exercise time;

rainbow option, the payo usually based on the maximum or minimumvalue of a group of correlated assets.

The interested reader may nd descriptions of these and other options, as wellas expositions of related topics, in [1, 10, 12, 17].

Other Options 203

14.9 Exercises

1. LetQ be the exchange rate process in dollars per euro, as given by (14.1).Show that

Qt = Q0 exp[σW ∗t + (rd − re − σ2/2)t

],

where W ∗ is dened in (14.3). Use this to derive the SDEs

(a)dQ

Q= σ dW ∗+(rd−re) dt and (b)

dQ−1

Q−1= σ dW ∗+(re−rd+σ2) dt.

Remark. Since Q−1 is the exchange rate process in euros per dollar,the term σ2 in (b) is at rst surprising, as it suggests an asymmetricrelationship between the currencies. This phenomenon, known as Siegel'sparadox, may be explained by observing that the probability measureP∗ is risk neutral when the dollar bond is the numeraire. This is theappropriate measure when pricing in the domestic currency and for thatreason P∗ is called the domestic risk-neutral probability measure. Both(a) and (b) are derived in this context. When calculating prices in aforeign currency, the foreign risk-neutral probability measure must beused. This is the probability measure under whichW ∗t −σt is a Brownianmotion, and is risk-neutral when the euro bond is taken as the numeraire.With respect to this measure, the Ito-Doeblin formula gives the expectedform of dQ−1.

2. Let V cp0 denote the price of a call-on-put option with strike price K0 andmaturity T0, where the underlying put has strike price K and maturesat time T > T0. Show that

V cp0 = e−rT0

∫ y1

−∞[PT0

(g(y))−K0]ϕ(y) dy,

wherey1 := supy | PT0

(g(y)) > K0,PT0

(s) = Ke−r(T−T0)Φ(− d2(T − T0, s)

)− sΦ

(− d1(T − T0, s)

),

and g and d1,2 are dened as in Section 14.4.

3. Let V pc0 and V cc0 denote, respectively, the prices of a put-on-call anda call-on-call option with strike price K0 and maturity T0, where theunderlying call has strike price K and matures at time T > T0. Provethe put-on-call, call-on-call parity relation

V pc0 − V cc0 + e−rT0

∫ ∞−∞

C (g(y))ϕ(y) dy = K0e−rT0 ,

where g and C are dened as in Section 14.4.


4. Let Cdit and Cdot denote, respectively, the time-t values of a down-and-in call option and a down-and-out call option, each with underlying S,strike price K, and barrier level c. Show that Cdo0 +Cdi0 = C0, where C0

is the price of the corresponding standard call option.

5. (Down-and-out forward). Show that the price V0 of a derivative withpayo (ST −K)IA, where A = mS ≥ c, is given by (14.9) and (14.10)with M = c.

6. Let Cdot and P dot denote, respectively, the time-t values of a down-and-out call option and a down-and-out put option, each with underlyingS, strike price K, and barrier level c. Let A = mS ≥ c. Show thatCdo0 − P do0 = V0, where V0 is as in Exercise 5.

7. (Currency barrier option). Referring to Section 14.1 and Subsec-tion 14.5.1, show that the cost of a down-and-out option to buy oneeuro for K dollars at time T is

Cdo0 = e−(rd+β2/2)T[S0E

(eγWT IB

)−KE

(eβWT IB

)],

where

S = QE, β =r

σ− σ

2, r = rd − re, and γ = β + σ,

Conclude that

Cdo0 = S0e−reT

Φ(d1)−(c

S0

) 2r

σ2+1

Φ(δ1)

−Ke−rdTΦ(d2)−

(c

S0

) 2r

σ2−1

Φ(δ2)

,where d1,2 and δ1,2 are dened as in (14.10) with r = rd − re.

8. A xed strike lookback put option has payo (K −mS)+. Show that thevalue of the option at time t is

Vt = e−r(T−t)E∗[(K −mS)+|FSt

]= e−r(T−t)

[K −Gt(min

(mSt ,K

), St)

],

where Gt(m, s) is given by (14.30).

9. Referring to Lemma 14.5.1, use the fact that U := −W is a P-Brownianmotion to show that the joint density fM (x, y) of (WT ,M

W ) under P isgiven by

fM (x, y) = −gm(x, y)I(x,y)|y≥0,y≥x.

Other Options 205

10. Referring to Subsection 14.5.1, carry out the following steps to nd theprice Cui0 of an up-and-in call option for the case S0 < K < c:

(a) Show that CuiT = (ST −K)IB , where

B = WT ≥ a,MW ≥ b, a := σ−1 ln (K/S0), b := σ−1 ln (c/S0),

and b > a > 0.

(b) Show that

Cui0 = e−(r+β2/2)T[S0E

(e(β+σ)WT IB

)−KE

(eβWT IB

)],

where β :=r

σ− σ

2.

(c) Use Exercise 9 and Lemma 14.5.2 to show that

E(eλWT IB

)= −

∫ b

a

∫ ∞b

eλxgm(x, y) dy dx−∫ ∞b

∫ ∞x

eλxgm(x, y) dy dx

= e2bλ+λ2T/2

Φ

(2b− a+ λT√

T

)− Φ

(b+ λT√

T

)+ eλ

2T/2Φ

(−b+ λT√T

).

(d) Conclude from (b) and (c) that

Cui0 = S0

(c

S0

) 2rσ2

+1 [Φ(e1)− Φ(e3)

]+ S0Φ(e5)−K−rTΦ(e6)

−K−rT(c

S0

) 2rσ2−1 [

Φ(e2)− Φ(e4)],

where

e1,2 =ln(c2/(KS0)

)+ (r ± σ2/2)T

σ√T

e3,4 =ln(c/S0) + (r ± σ2/2)T

σ√T

e5,6 =ln(S0/c) + (r ± σ2/2)T

σ√T

.

11. Find the price of an up-and-in call option for the case S0 < c < K.

12. In the notation of Section 14.6 and Subsection 14.5.1, a down-and-outquanto call option has payo VT = (SeT − K)IB , where SeT and K aredenominated in dollars. Carry out the following steps to nd the costV0 of the option for the case K > c:


(a) Replace St in (14.12) by

Set = Se0 exp [σW ∗(t) + βt] , β := re + σ22 − 1

2σ2

(see (14.32)).

(b) Find a formula analogous to (14.18).

(c) Conclude from (b) that

V0 = Se0e(s−rd)T

[Φ(d1)−

(c

S0

)%+1

Φ(δ1)

]

−Ke−rdT[

Φ(d2)−(c

S0

)%−1

Φ(δ2)

]

where s = re + σ22 , % = 2s/σ2,

d1,2 =ln (S0/K) + (s± 1

2σ2)T

σ√T

, and

δ1,2 =ln(c2/(S0K)

)+ (s± 1

2σ2)T

σ√T

.

13. Let Cdo0 be the price of a down-and-out barrier call option, as given by(14.9). Show that

limc→S−0

Cdo0 = 0 and limc→0+

Cdo0 = C0,

where C0 is the cost of a standard call option. Interpret.

14. In the notation of Theorems 14.5.6 and 14.5.9, show that

V0 = limn→∞

V(n)0 .

15. Referring to Subsection 14.7.2, show that if dividend payments are madeat the equally spaced times

tj = jT/(n+ 1), j = 1, 2, . . . , n,

then

St = (1− δ)b(n+1)t/TcSt and

Vt = e−r(T−t)G(

(1− δ)n−b(n+1)t/TcS(t)), 0 ≤ t < T,

where bxc denotes the greatest integer in x.

16. Referring to Subsection 14.7.1, show that(e(δ−r)tSt

)is a P∗-martingale.

Other Options 207

17. (Barrier option on a stock with dividends). Find a formula for the priceof a down-and-out call option based on a stock that pays a continuousstream of dividends.

18. Referring to Section 14.3, nd the probability under the risk-neutralmeasure P∗ that the call is chosen at time T0.

19. Referring to Subsection 14.5.1, nd the probability under P that thebarrier c is breached.

20. A shout option is a European option that allows the holder to shoutto the writer at some time τ before maturity her wish to lock in thecurrent price Sτ of the security. For a call option, the holder's payo atmaturity, assuming that a shout is made, is

VT := max(Sτ −K,ST −K),

where K is the strike price. (If no shout is made, then the payo is theusual amount (ST −K)+.) Show that

VT = Sτ −K + (ST − Sτ )+

and use this to nd the value of the shout option at time t ≥ τ .

21. A cliquet (or ratchet) option is a derivative with payo (ST0− K)+

at time T0 and payo (STj − STj−1)+ at time Tj , j = 1, . . . , n, where

0 < T0 < T1 < · · · < Tn. Thus, the strike price of the option is initiallyset at K, but at times Tj , 0 ≤ j ≤ n− 1, it is reset to STj . Find the costof the option.

22. Use the methods of Subsection 14.5.1 to nd the price V0 of a derivativewith payo VT = (ST −mS)IA, where A = mS ≥ c, c < S0.


Appendix A

Sets and Counting

Basic Set Theory

A set is a collection of objects called the members or elements of the set.Abstract sets are usually denoted by capital letters A, B, and so forth. If x isa member of the set A, we write x ∈ A; otherwise, we write x 6∈ A. The emptyset, denoted by ∅, is the set with no members.

A set can be described either by words, by listing the elements, or byset-builder notation. Set builder notation is of the form x | P (x), whichis read the set of all x such that P (x), where P (x) is a well-denedproperty that x must satisfy to belong to the set. For example, the set ofall even positive integers can be described as 2, 4, 6, . . . or as n | n =2m for some positive integer m.

A set A is nite if either A is the empty set or, for some positive integern, there is a one-to-one correspondence between A and the set 1, 2, . . . , n.In eect, this means that the members of A may be labeled with the num-bers 1, 2, . . . , n, so that A may be described as, say, a1, a2, . . . , an. A setis countably innite if its members may be labeled with the positive integers1, 2, 3, . . .; countable if it is either nite or countably innite; and uncountableotherwise. The set N of all positive integers is obviously countably innite, asis the set Z of all integers. Less obvious is the fact that set Q of all rationalnumbers is countably innite. The set R of all real numbers is uncountable,as is any (nontrivial) interval of real numbers.

Sets A and B are said to be equal, written A = B, if every member of Ais a member of B and vice versa. A is a subset of B, written A ⊆ B, if everymember of A is a member of B. It follows that A = B i 1 A ⊆ B and B ⊆ A.Note that the empty set is a subset of every set. Hereafter, we shall assumethat all sets under consideration in any discussion are subsets of a larger setS, sometimes called the universe (of discourse) for that discussion.

1Read if and only if.

209


The basic set operations are

A ∪B = x | x ∈ A or x ∈ B, the union of A and B;A ∩B = x | x ∈ A and x ∈ B, the intersection of A and B;A′ = x | x ∈ S and x 6∈ A, the complement of A;

A−B = x | x ∈ A and x 6∈ B, the dierence of A and B;A×B = (x, y) | x ∈ A and y ∈ B, the product of A and B.

Similar denitions may be given for the union, intersection, and product ofthree or more sets, or even for innitely many sets. For example,

∞⋃n=1

An = A1 ∪A2 ∪ · · · = x | x ∈ An for some n,

∞⋂n=1

An = A1 ∩A2 ∩ · · · = x | x ∈ An for all n,

∞∏n=1

An = A1 ×A2 × · · · = (a1, a2, . . . ) | an ∈ An, n = 1, 2, . . . .

We usually omit the cap symbol in the notation for intersection, writing, forexample, ABC instead of A ∩B ∩C. Similarly, we write AB′ for A−B, etc.

A collection of sets is said to be pairwise disjoint if AB = ∅ for each pairof distinct members A and B of the collection. A partition of a set S is acollection of pairwise disjoint nonempty sets whose union is S.

Counting Techniques

The number of elements in a nite set A is denoted by |A|. In particular,|∅| = 0. The following result is easily established by mathematical induction.

Theorem A.1. If A1, A2, . . . , Ar are pairwise disjoint nite sets, then

|A1 ∪A2 ∪ · · · ∪Ar| = |A1|+ |A2|+ . . .+ |Ar|.

Corollary A.2. If A and B are nite sets, then

|A ∪B| = |A|+ |B| − |AB|.

Proof. Note that

A ∪B = AB′ ∪A′B ∪AB, A = AB′ ∪AB, and B = A′B ∪AB,

where the sets comprising each of these unions are pairwise disjoint. By The-orem A.1,

|A∪B| = |AB′|+|A′B|+|AB|, |A| = |AB′|+|AB|, and |B| = |A′B|+|AB|.

Sets and Counting 211

Subtracting the second and third equations from the rst and rearrangingyields the desired formula.

Theorem A.3. If A1, A2, . . . , An are nite sets then

|A1 ×A2 × · · · ×An| = |A1||A2| · · · |An|.

Proof. (By induction on n.) Consider the case n = 2. For each xed x1 ∈ A1

there are |A2| elements of the form (x1, x2), where x2 runs through the set A2.Since all the members of A1×A2 may be listed in this manner it follows that|A1×A2| = |A1||A2|. Now suppose that the assertion of the theorem holds forn = k−1. SinceA1×A2×· · ·×Ak = B×Ak, whereB = A1×A2×· · ·×Ak−1, thecase n = 2 implies that |B × Ak| = |B||Ak|. But by the induction hypothesis|B| = |A1||A2| · · · |Ak−1|. Therefore, the assertion holds for n = k.

Theorem A.3 is the basis of the so-calledMultiplication Principle, describedas follows:

When performing a task requiring r steps, if there are n1 ways

to complete step 1, and if for each of these there are n2 ways to

complete step 2, . . ., and if for each of these there are nr ways to

complete step r, then there are n1n2 · · ·nr ways to perform the

task.

Example A.4. How many three-letter sequences can be made from the lettersof the word formula if no letter is used more than once?

Solution: We have the following three-step process:

Step 1: Select the rst letter: 7 choices.

Step 2: Select the second letter: 6 choices, since the letter chosen in Step1 is no longer available.

Step 3: Select the third letter: 5 choices.

By the multiplication principle, there are a total of 7 · 6 · 5 = 210 possiblesequences.

The sequences in Example A.4 are called permutations, formally denedas follows.

Denition A.5. Let n and r be positive integers with 1 ≤ r ≤ n. A permu-tation of n items taken r at a time is an ordered list of r items chosen fromthe n items. The number of such lists is denoted by (n)r.

An argument similar to that in Example A.4 shows that

(n)r = n(n− 1)(n− 2) · · · (n− r + 1).


This may be written compactly as

(n)r =n!

(n− r)! ,

where the symbol m!, read m factorial, is dened as

m! = m(m− 1)(m− 2) · · · 2 · 1.

By convention, 0! = 1, a choice that ensures consistency in combinatorialformulas. For example, the number of permutations of n items taken n at atime is, according to the above formula and convention, n!

0! = n!, which is whatone obtains by directly applying the multiplication principle.

Example A.6. In how many ways can a group of 4 women and 4 men lineup for a photograph if no two adjacent people are of the same gender?

Solution: There are two alternatives, corresponding to the gender of, say,the rst person on the left. For each alternative there are 4! arrangementsof the women and 4! arrangements of the men. Thus, there are a total of2(4!)2 = 1152 arrangements.

Denition A.7. Let n and r be positive integers with 1 ≤ r ≤ n. A com-bination of n items taken r at a time is a set of r items chosen from the nitems.

In contrast to a permutation, which is an ordered list, a combination maybe viewed as an unordered list. Since each set of size r gives rise to r! permu-tations, the number of combinations of n things taken r at a time is seen tobe

(n)rr!

=n!

(n− r)!r! .

The quotient on the right is called a binomial coecient and is denoted by(nr

), read n choose r. Its name derives from the binomial theorem, which

asserts that

(a+ b)n =

n∑j=0

(n

j

)ajbn−j .

(This may be proved combinatorially as follows: The expression

(a+ b)n = (a+ b)(a+ b) · · · (a+ b)︸︷︷︸n factors

is the sum of all products of the form x1x2 · · ·xn, where xi is the a or b termin the ith factor. Each of these products may be written ajbn−j for some0 ≤ j ≤ n. For each j there are exactly

(nj

)such terms, corresponding to the

number of ways exactly j of the n factors in x1x2 · · ·xn may be chosen as a.)

Sets and Counting 213

Example A.8. A restauranteur needs to hire a sauté chef, a sh chef, avegetable chef, and three grill chefs. If there are 10 applicants equally qualiedfor the positions, in how many ways can the positions be lled?

Solution: We apply the multiplication principle: First, select a sauté chef:10 choices; second, select a sh chef: 9 choices; third, select a vegetable chef: 8choices; nally, select three grill chefs from the remaining 7 applicants:

(73

)=

35 choices. Thus, there are a total of 10 · 9 · 8 · 35 = 25, 200 choices.

Example A.9. A bag contains 5 red, 4 yellow, and 3 green marbles. In howmany ways is it possible to draw 5 marbles at random from the bag withexactly 2 reds and no more than 1 green?

Solution: We have the following decision scheme:

Case 1: No green marbles.

Step 1: Choose the 2 reds:(

52

)= 10 possibilities.

Step 2: Choose 3 yellows:(

43

)= 4 possibilities.

Case 2: Exactly one green marble.

Step 1: Choose the green: 3 possibilities.

Step 2: Choose the 2 reds:(

52

)= 10 possibilities.

Step 3: Choose 2 yellows:(

42

)= 6 possibilities.

Thus, there are a total of 40 + 180 = 220 possibilities.

Example A.10. How many dierent 12-letter arrangements of the letters ofthe word arrangements are there?

Solution: Notice that there are duplicate letters, so the answer 12! is in-correct. We proceed as follows:

Step 1: Select positions for the a's:(

122

)= 66 choices.

Step 2: Select positions for the r's:(

102

)= 45 choices.

Step 3: Select positions for the e's:(

82

)= 28 choices.

Step 4: Select positions for the n's:(

62

)= 15 choices.

Step 5: Fill the remaining spots with the letters g, m, t, s: 4! choices.

Thus there are 66 · 45 · 28 · 15 · 24 = 29, 937, 600 dierent arrangements.

We conclude this section with the following application of the binomialtheorem.

Theorem A.11. A set S with n members has 2n subsets (including S and∅).


Proof. Since(nr

)gives the number of subsets of size r, the total number of

subsets of S is (n

0

)+

(n

1

)+ · · ·+

(n

n

).

By the binomial theorem, this quantity is (1 + 1)n = 2n.

Remark. One can also prove Theorem A.11 by induction on n: The conclu-sion is obvious if n = 0 or 1. Assume the theorem holds for all sets with n ≥ 1members, and let S be a set with n + 1 members. Choose a xed element sfrom S. This produces two collections of subsets of S: those that contain s andthose that don't. The latter are precisely the subsets of S − s, and, by theinduction hypothesis, there are 2n of these. But the two collections have thesame number of subsets, since one collection may be obtained from the otherby either removing or adjoining s. Thus, there are a total of 2n + 2n = 2n+1

subsets of S, completing the induction step.

Appendix B

Solution of the BSM PDE

In this appendix, we solve the Black-Scholes-Merton PDE

vt + rsvs + 12σ

2s2vss − rv = 0, s > 0, 0 ≤ t ≤ T, (B.1)

with boundary conditions

v(T, s) = f(s), s ≥ 0, and v(t, 0) = 0, 0 ≤ t ≤ T, (B.2)

where f is continuous and satises suitable growth conditions (see footnoteon page 217).

Reduction to a Diusion Equation

As a rst step, we simplify Equation (B.1) by making the substitutions

s = ex, t = T − 2τ

σ2, and v(t, s) = u(τ, x). (B.3)

By the chain rule,

vt(t, s) = uτ (τ, x)τt = − 12σ

2uτ (τ, x),

vs(t, s) = ux(τ, x)xs = s−1ux(τ, x),

vss(t, s) = s−2 [uxx(τ, x)− ux(τ, x)] .

Substituting these expressions into (B.1) produces the equation

− 12σ

2uτ (τ, x) + rux(τ, x) + 12σ

2 [uxx(τ, x)− ux(τ, x)]− ru(τ, x) = 0.

Dividing by σ2/2 and setting k = 2r/σ2, we obtain

uτ (τ, x) = (k − 1)ux(τ, x) + uxx(x, τ)− ku(x, τ). (B.4)

In terms of u, the conditions in (B.2) become

u(0, x) = f(ex), and limx→−∞

u(τ, x) = 0, 0 ≤ τ ≤ Tσ2/2.

Equation (B.4) is an example of a diusion equation.

215


Reduction to the Heat Equation

The diusion equation (B.4) may be reduced to a simpler form by thesubstitution

u(τ, x) = eax+bτw(τ, x) (B.5)

for suitable constants a and b. To determine a and b, we calculate the partialderivatives

uτ (τ, x) = eax+bτ [bw(τ, x) + wτ (τ, x)]

ux(τ, x) = eax+bτ [aw(τ, x) + wx(τ, x)]

uxx(τ, x) = eax+bτ [awx(τ, x) + wxx(τ, x)] + aeax+bτ [aw(τ, x) + wx(τ, x)]

= eax+bτ [a2w(τ, x) + 2awx(τ, x) + wxx(τ, x)].

Substituting these expressions into (B.4) and dividing by eax+bτ yields

bw(τ, x) + wτ (τ, x) = (k − 1)[aw(τ, x) + wx(τ, x)

]+ a2w(τ, x)

+ 2awx(τ, x) + wxx(τ, x)− kw(τ, x),

which simplies to

wτ (τ, x) =[a(k − 1) + a2 − k − b

]w(τ, x) + [2a+ k − 1]wx(τ, x) + wxx(τ, x).

The terms involving w(τ, x) and wx(τ, x) may be eliminated by choosing

a = 12 (1− k) and b = a(k − 1) + a2 − k = − 1

4 (k + 1)2.

With these values of a and b, we obtain the PDE

wτ (τ, x) = wxx(τ, x), τ > 0. (B.6)

Since w(0, x) = e−axu(0, x), the boundary condition u(0, x) = f(ex) becomes

w0(x) := w(0, x) = e−axf(ex). (B.7)

Equation (B.6) is the well-known heat equation of mathematical physics.

Solution of the Heat Equation

To solve the heat equation, we begin with the observation that the function

κ(τ, x) =1

2√πτe−x

2/4τ =1√2τϕ

(x√2τ

)

Solution of the BSM PDE 217

is a solution of (B.6), as may be readily veried. The function κ is called thekernel of the heat equation. To construct a solution of (B.6) that satises(B.7) we form the convolution of κ with w0:

w(τ, x) =

∫ ∞−∞

w0(y)κ(τ, x− y) dy. (B.8)

Dierentiating inside the integral,1 we obtain

wτ (τ, x) =

∫ ∞−∞

w0(y)κτ (τ, x− y) dy,

wx(τ, x) =

∫ ∞−∞

w0(y)κx(τ, x− y) dy, and

wxx(τ, x) =

∫ ∞−∞

w0(y)κxx(τ, x− y) dy.

Since κt = κxx we see that w satises (B.6). Also,

w(τ, x) =1√2τ

∫ ∞−∞

w0(y)ϕ

(y − x√

2τ

)dy =

∫ ∞−∞

w0

(x+ z

√2τ)ϕ(z) dz,

hence,

limτ→0+

w(τ, x) =

∫ ∞−∞

limτ→0+

w0

(x+ z

√2τ)ϕ(z) dz

=

∫ ∞−∞

w0(x)ϕ(z) dz

= w0(x).

Therefore, w has a continuous extension to R+ ×R that satises the initialcondition (B.7).

From (B.7) we see that the solution w in (B.8) may now be written

w(τ, x) =1

2√τπ

∫ ∞−∞

f(ey)e−ay− 1

2

(y−x√

2τ

)2

dy.

Rewriting the exponent in the integrand as −ax+a2τ − 14τ (y−x+ 2aτ)2 and

making the substitution z = ey we obtain

w(τ, x) =e−ax+a2τ

2√τπ

∫ ∞0

f(z)e−14τ ln z−x+2aτ2 dz

z. (B.9)

1 That limit operations such as dierentiation may be moved inside the integral is justi-ed by a theorem of real analysis, which is applicable in the current setting provided thatw0 does not grow too rapidly. In the case of a call option, for example, one can show that|w0(x)| ≤ MeN|x| for suitable positive constants M and N and for all x. This inequalityis sucient to ensure that the appropriate integrals converge, allowing the interchange oflimit operation and integral.


Back to the BSM PDE

The nal step is to unravel the substitutions that led to the heat equation.From (B.3), (B.5), and (B.9),

v(t, s) = u(τ, x) = eax+bτw(τ, x)

=e(a2+b)τ

2√τπ

∫ ∞0

f(z)e−14τ ln z−x+2aτ2 dz

z.

Recalling that

k =2r

σ2, a =

1− k2

=σ2 − 2r

2σ2, b = − (k + 1)2

4, and τ =

σ2(T − t)2

,

we have

(a2 + b)τ =(k − 1)2 − (k + 1)2

4τ = −kτ = −r(T − t) and

2aτ =σ2 − 2r

σ2· σ

2(T − t)2

=(σ2 − 2r)(T − t)

2= −(r − 1

2σ2)(T − t).

Since x = ln s, we obtain the following solution for the general Black-Scholes-Merton PDE

v(t, s) =e−rτ

σ√

2πτ)

∫ ∞0

f(z) exp

(−1

2

ln (z/s)− (r − σ2/2)τ

σ√τ

2)dz

z,

where τ := T − t. Making the substitution

y =ln (z/s)− (r − σ2/2)τ

σ√τ

and noting that

z = s expyσ√τ + (r − σ2/2)τ

and

dz

z= σ√τ dy,

we arrive at

v(t, s) = e−rτ∫ ∞−∞

f(s exp

σ√τ y + (r − σ2/2)τ

)ϕ(y) dy.

It may be shown that the solution to the BSM PDE is unique within aclass of functions that do not grow too rapidly. (See, for example, [18].)

Appendix C

Analytical Properties of the BSM Call

Function

.Recall that the Black-Scholes-Merton call function is dened as

C = C(τ, s, k, σ, r) = sΦ(d1)− ke−rτΦ(d2), τ, s, k, σ, r > 0,

where

d1,2 = d1,2(τ, s, k, σ, r) =ln (s/k) + (r ± σ2/2)τ

σ√τ

.

C(τ, s, k, σ, r) is the price of a call option with strike price k, maturity τ , andunderlying stock price s. In this appendix, we prove Theorems 11.4.1 and11.4.2, which summarize the main analytical properties of C.

Preliminary Lemmas

Lemma C.1.

∫ ∞−d2

eσ√τzϕ(z) dz = eσ

2τ/2Φ(d1).

Proof. The integral may be written

1√2π

∫ ∞−d2

eσ√τz−z2/2 dz =

eσ2τ/2

√2π

∫ ∞−d2

e−(z−σ√τ)2/2 dz

=eσ

2τ/2

√2π

∫ ∞−d2−σ

√τ

e−x2/2 dx

= eσ2τ/2Φ

(−d2 − σ

√τ),

where we have made the substitution x = z− σ√τ . Since d2 + σ√τ = d1, the

conclusion of the lemma follows. (Alternately, one could use Exercise 11.12.)

Lemma C.2.

∫ ∞−d2

zeσ√τzϕ(z) dz = eσ

2τ/2[σ√τ Φ(d1) + ϕ(d1)

].

219


Proof. Arguing as in the proof Lemma C.1, we have∫ ∞−d2

zeσ√τzϕ(z) dz =

eσ2τ/2

√2π

∫ ∞−d2

ze−(z−σ√τ)2/2 dz

=eσ

2τ/2

√2π

∫ ∞−d1

(x+ σ

√τ)e−x

2/2 dx

=eσ

2τ/2

√2π

∫ ∞−d1

xe−x2/2 dx+

σ√τeσ

2τ/2

√2π

∫ ∞−d1

e−x2/2 dx

=eσ

2τ/2

√2π

∫ ∞d21/2

e−y dy + σ√τeσ

2τ/2 [1− Φ(−d1)]

= eσ2τ/2ϕ(d1) + σ

√τeσ

2τ/2Φ(d1).

Lemma C.3. For positive τ, s, k, σ, r, dene

g(z) = g(τ, s, k, σ, r, z) := seσ√τz+(r−σ2/2)τ − k.

Then

C = e−rτ∫ ∞−d2

g(z)ϕ(z) dz = e−rτ∫ ∞−∞

g+(z)ϕ(z) dz.

Proof. By Lemma C.1,∫ ∞−d2

g(z)ϕ(z) dz = se(r−σ2/2)τ

∫ ∞−d2

eσ√τzϕ(z) dz − k

∫ ∞−d2

ϕ(z) dz

= serτΦ(d1)− kΦ(d2)

= erτC.

Since g+ is increasing in z, equals 0 if z ≤ −d2, and equals g otherwise, theassertion follows.

Lemma C.4. With g as in Lemma C.3,

∂

∂x

∫ ∞−d2(τ,s,k,σ,r)

g(τ, s, k, σ, r, z)ϕ(z) dz =

∫ ∞−d2

gx(z)ϕ(z) dz,

where x denotes any of the variables τ , s, k, σ, r.

Proof. Suppose that x = s. Fix the variables τ , k, σ, and r, and dene

h(s) = d2(τ, s, k, σ, r) and F (s1, s2) =

∫ ∞−h(s1)

g(τ, s2, k, σ, r, z)ϕ(z) dz.

By the chain rule for functions of several variables, the left side of the equationin the assertion of the lemma for x = s is

d

dsF (s, s) = F1(s, s) + F2(s, s).

Analytical Properties of the BSM Call Function 221

SinceF1(s1, s2) = g

(τ, s2, k, σ, r,−h(s1)

)ϕ(− h(s1)

)h′(s1)

and g(τ, s, k, σ, r,−h(s)

)= 0, we see that F1(s, s) = 0. Noting that

F2(s1, s2) =

∫ ∞−h(s1)

gs(τ, s2, k, σ, r, z)ϕ(z) dz

we now have

d

dsF (s, s) = F2(s, s) =

∫ ∞−d2(τ,s,k,σ,r)

gs(τ, s, k, σ, r, z)ϕ(z) dz,

which is the assertion of the lemma for the case x = s. A similar argumentworks for the variables τ , k, σ, and r.


(i)∂C

∂s= Φ(d1): By Lemmas C.1, C.3, and C.4,

∂C

∂s= e−rτ

∫ ∞−d2

gs(z)ϕ(z) dz

= e−τσ2/2

∫ ∞−d2

eσ√τzϕ(z) dz

= Φ(d1).

(ii)∂2C

∂s2=ϕ(d1)

sσ√τ: This follows from the chain rule and part (i).

(iii)∂C

∂τ=

σs

2√τϕ(d1) + kre−rτΦ(d2): By Lemmas C.3 and C.4,

∂C

∂τ= e−rτ

∫ ∞−d2

gτ(z)ϕ(z) dz − re−rτ

∫ ∞−d2

g(z)ϕ(z) dz

= A−B, say.

By Lemma C.3

B = rC = rsΦ(d1)− rke−rτΦ(d2),


and by Lemmas C.1 and C.2

A = se−σ2τ/2

σ

2√τ

∫ ∞−d2

zeσ√τzϕ(z) dz + (r − σ2/2)

∫ ∞−d2

eσ√τzϕ(z) dz

= se−σ

2τ/2

σ

2√τeσ

2τ/2(σ√τΦ(d1) + ϕ(d1)

)+ (r − σ2/2)eσ

2τ/2Φ(d1)

=

sσ

2√τϕ(d1) + rsΦ(d1).

(iv)∂C

∂σ= s√τϕ(d1): By Lemmas C.1C.4,

∂C

∂σ= e−rτ

∫ ∞−d2

gσ(z)ϕ(z) dz

= se−σ2τ/2

∫ ∞−d2

(z√τ − στ

)eσ√τzϕ(z) dz

= se−σ2τ/2

√τeσ

2τ/2(σ√τΦ(d1) + ϕ(d1)

)− στeσ2τ/2Φ(d1)

= s√τ ϕ(d1).

(v)∂C

∂r= kτe−rτΦ(d2): By Lemmas C.1, C.3, and C.4,

∂C

∂r= e−rτ

∫ ∞−d2

gr(z)ϕ(z) dz − τe−rτ∫ ∞−d2

g(z)ϕ(z) dz

= e−rτ∫ ∞−d2

gr(z)ϕ(z) dz − τC

= τse−σ2/2t

∫ ∞−d2

eσ√τzϕ(z) dz − τsΦ(d1) + kτe−rτΦ(d2)

= kτe−rτΦ(d2).

(vi)∂C

∂k= −e−rτΦ(d2): By Lemmas C.3 and C.4,

∂C

∂k= e−rτ

∫ ∞−d2

gk(z)ϕ(z) dz = −e−rτ

∫ ∞−d2

ϕ(z) dz = −e−rτΦ(d2).


The proofs of the limit formulas make use of

limz→∞

Φ(z) = 1, limz→−∞

Φ(z) = 0,

and the limit properties of d1,2.

Analytical Properties of the BSM Call Function 223

(i) lims→+∞

[C(τ, s, k, σ, r)− (s− ke−rτ )] = 0: Note rst that

C(τ, s, k, σ, r)− s+ ke−rτ = s (Φ(d1)− 1)− ke−rτ (Φ(d2)− 1) .

Since lims→∞ d1 = lims→∞ d2 = ∞, lims→∞ Φ(d1,2) = 1. It remains toshow that lims→∞ s (Φ(d1)− 1) = 0 or, by l'Hospital's rule,

lims→+∞

s2 ∂

∂sΦ(d1) = 0. (†)

Since∂d1

∂s= (sσ

√τ)−1,

s2 ∂

∂sΦ(d1) = s2ϕ(d1)

∂d1

∂s=se−d

21/2

σ√

2πτ.

Now, d1 is of the form (ln (s/k) + b)/a for suitable constants a and bhence s = keln (s/k) = kead1−b. It follows that

lims→+∞

se−d21/2 = ke−b lim

s→+∞ead1−d

21/2 = 0,

verifying (†) and completing the proof of (i).

(ii) lims→0+

C(τ, s, k, σ, r) = 0: Immediate from lims→0+

d1,2 = −∞.

(iii) limτ→∞

C(τ, s, k, σ, r) = s: Follows from limτ→∞

d1,2 = ±∞.

(iv) limτ→0+

C(τ, s, k, σ, r) = (s − k)+: Since limτ→0+

g+ = (s − k)+, Lemma C.3

implies that

limτ→0+

C(τ, s, k, σ, r) =

∫ ∞−∞

(s− k)+ϕ(z) dz = (s− k)+.

(v) limk→∞

C(τ, s, k, σ, r) = 0: This follows from Lemma C.3 and from

limk→∞

g+ = 0.

(vi) limk→0+

C(τ, s, k, σ, r) = s: Follows from limk→0+

d1 = +∞.

(vii) limσ→∞

C(τ, s, k, σ, r) = s: Immediate from limσ→∞

d1,2 = ±∞.

(viii) limσ→0+

C(τ, s, k, σ, r) = (s − e−rτk)+: Since limσ→0+

g+ = (serτ − k)+, by

Lemma C.3 we have

limσ→0+

C(τ, s, k, σ, r) = e−rτ∫ ∞−∞

(serτ − k)+ϕ(z) dz = (s− e−rτk)+.

(ix) limr→∞

C(τ, s, k, σ, r) = s: Follows immediately from limr→∞

d1 = +∞.


Appendix D

Hints and Solutions to Odd-Numbered

Problems

Chapter 1

1. Rounding to two decimal places,

(a) 1500(1 + .06)3 = $1786.52;

(b) 1500(1 + .06/4)12 = $1793.43;

(c) 1500(1 + .06/12)36 = $1795.02;

(d) 1500(1 + .06/365)3·365 = $1795.80;

(e) 1500e3(.06) = $1795.83.

3. (a) 12.55%; (b) 12.68%; (c) 12.75%.

5. A5 = $29,391; A10 = $73,178.

7. n = 64 is the smallest value satisfying

400(1.005)n − 1

.005≥ 30, 000.

9. A5 = $130, 229.97 and A10 = $36, 120.65. The account will be drawndown to zero after 139 withdrawals. (The last withdrawal will be$1,941.85.)

11. n = 39.

13. The time-n value of the withdrawal made at time n + j is Pe−rj/12,where j = 1, 2, . . . , N − n. Add these to obtain the desired result.

15. The rate i = r/12 must satisfy

1800 = 300, 000i

1− (1 + i)−360.

This gives r ≈ .06.

17. A0 = $42, 035.

225


19. Paying $6000 now and investing $2000 for 10 years gives $2000e10r withwhich to pay o the remaining $6000. The rate r0 that would allow youto cover the $6000 exactly satises the equation 2000e10r0 = 6000, whichhas solution r0 = ln 3

10 ≈ 0.11.

21. Your current monthly payments for the 6% mortgage are P ≈ $600. Af-ter 10 years you still owe A120 ≈ $83, 686. You must nance the amount(1.03)A120 for 20 years. Payments for the new 4% mortgage are there-fore Q ≈ $522. The monthly rate for which Q = $600 is approximately.0047. Therefore, an annual mortgage rate above 12(.47) = 5.64% wouldmake renancing unwise.

23. Bt =∑Nn=m+1 e

−r(tn−t)Cn + Fe−r(T−t).

25. The rate of return for Plan A is 22.68% while that of Plan B is 22.82%.Therefore, Plan B is slightly better.

Chapter 2

1. Use the inclusion-exclusion rule.

3. Let Aj be the event that Jill wins in j races, j = 3, 4, 5. A3 occurs inonly one way, A4 in 3 ways, and A5 in 6 ways. Therefore, Jill wins withprobability

P(A3) + P(A4) + P(A5) = q3 + 3pq3 + 6p2q3 = q3(1 + 3p+ 6p2).

5. The probability pn that at least two out of n balls land in the same jaris

1− 30 · 29 · · · · · (30− n+ 1)

(30)n.

Since p7 ≈ .53 and p8 ≈ .64, at least 8 throws are needed.

7. Let C be the event that both tosses are heads, A the event that at leastone toss comes up heads, and B the event that the rst toss comes upheads. Then

P(C|A) =p

2− p and P(C|B) = p.

The probabilities are not the same sincep

2− p = p implies p = 0 or 1.

9. Let A be the event that the slip numbered 1 was drawn twice and B theevent that the sum of the numbers on the three slips drawn is 8. ThenP(AB) = 3/63 and P(B) = 21/63 so P(A|B) = 3/21.

Hints and Solutions 227

11. P(A) = .5, P(B) = (.1)2, and P(AB) = (.5)(.1)2. Therefore, the eventsare independent. Changing the inequality to x < .49 makes the eventsdependent.

13. For (c),

P(A′B′) = 1− P(A ∪B) = 1− [P(A) + P(B)− P(AB)]

= [1− P(A)][1− P(B)] = P(A′)P(B′).

15. (a) Let x = r/s. Then P(E) = xP(E′) = x(1− P(E)) hence

P(E) =x

x+ 1=

r

r + s.

(b) If E occurs, then the bettor receives

1 +s

r=r + s

r=

1

P(E).

Chapter 3

1. The number of heads in n tosses is a binomial random variable Yn withparameters (n, .5); hence the smallest n for which P(Yn ≥ 2) ≥ .99satises

P(Yn = 0) + P(Yn = 1) = 2−n(1 + n) ≤ .01.

Therefore, n = 11.

3. pX

(k) =

(n

k

)ANBN , where

AN =

(Np

N

)(Np− 1

N − 1

)· · ·(Np− k + 1

N − k + 1

)→ pk and

BN =

(Nq

N − k

)(Nq − 1

N − k − 1

)· · ·(Nq − n+ k + 1

N − n+ 1

)→ qn−k

as N →∞.

5. For a > 0,

FY

(y) = P(X ≤ y − b

a

)= F

X

(y − ba

).

Dierentiating yields the desired result in this case.

7. Since [Φ(x) + Φ(−x)]′

= 0, Φ(x) + Φ(−x) = 2Φ(0) = 1. If X ∼ N(0, 1),

P(−X ≤ x) = P(X ≥ −x) = 1− P(X ≤ −x) = 1− Φ(−x) = Φ(x),

so −X ∼ N(0, 1).


9. FZ

(r) = 0 for r ≤ 0. For r > 0,

FZ

(r) =

1, r ≥

√2,

πr2

4 , 0 ≤ r ≤ 1,

r2 arcsin(

1r

)− πr2

4 +√r2 − 1, 1 ≤ r ≤

√2.

Therefore, fZ

= gZI[0,√2], where

gZ

(r) =

πr2 , 0 ≤ r ≤ 1

2r arcsin(

1r

)− πr

2 , 1 ≤ r ≤√

2.

11. Let α = Φ(µσ−1

).

(a) There are(nk

)choices of times for the k increases, and each of these

has probability

P(Z1 > 1, Z2 > 1, . . . , Zk > 1, Zk+1 < 1, . . . , Zn < 1).

Therefore, the desired probability is

(n

k

)αk(1− α)n−k.

(b) The k consecutive increases can start at times 1, 2, . . . , n − k + 1hence the required probability is (n− k + 1)αk(1− α)n−k.

(c) Assuming that k < n, the event in question is the union of themutually exclusive events

Z1 > 1, Z2 > 1, . . . , Zk > 1, Zk+1 < 1,Zn−k < 1, Zn−k+1 > 1, . . . , Zn > 1, andZj < 1, Zj+1 > 1, . . . , Zj+k > 1, Zj+k+1 < 1,

where j = 1, 2, . . . , n− k − 1. Therefore, the required probability is2αk(1− α) + (n− k − 1)αk(1− α)2.

13. By independence,

P(max(X,Y ) ≤ z) = P(X ≤ z, Y ≤ z) = P(X ≤ z)P(Y ≤ z)

and

P(min(X,Y ) > z) = P(X > z, Y > z) = P(X > z)P(Y > z).

Chapter 4

1. Suppose that C0 > S0. We then buy the security for S0, write a calloption, and place the prot C0 − S0 into a risk-free account yielding


erT (C0 − S0) at time T . If ST > K, we must sell the security for K. IfST ≤ K, we sell the security for ST . In any case, the total proceeds fromthese transactions are erT (C0−S0) + minST ,K, giving an arbitrage.Therefore, C0 ≤ S0. The other inequalities follow from the put-call par-ity formula.

3. If S0+P e0 −C0 > Ke−rT we sell short one share of the security, sell a putoption, and buy a call option. We deposit the resulting cash S0 +P e0 −C0

in a risk-free account. If ST < K the call option we bought is worthlessand the put option we sold will be exercised, requiring us to buy thesecurity for the amountK. If ST ≥ K the put option we sold is worthlessbut we can exercise our call option and buy the security forK. Since eachcase requires a cash outlay ofK, the transactions give us a positive protof (S0 + P e0 − C0)erT −K, contradicting the no-arbitrage assumption.

5. P can be exercised at any time in the interval [0, T ], while P ′ can beexercised only at times in the subinterval [0, T ′]. This gives P greaterexibility and hence greater value.

Strip Payoff

ST

2K

K

K

STK2K

2K

Strap Payoff

(a) (b)

2K

K

FIGURE D.1: Exercise 7

K1

K1 K2

Strangle Payoff

ST

FIGURE D.2: Exercise 9

11. If P0 > P ′0, buy the lower-priced option and sell the higher-priced onefor a cash prot of P0−P ′0. The three possibilities at maturity, ST < K,K ≤ ST ≤ K ′, and K ′ < ST , result in the respective payos K ′ −K,K ′ − ST , and 0. Therefore, the prot is at least P0 − P ′0 > 0, giving anarbitrage. That P0 ≤ P ′0 is to be expected since a smaller strike pricegives a smaller payo.


13. By Exercises 10, 11, and put-call parity,

0 ≤ C0 − C ′0 = P0 − P ′0 + (K ′ −K)e−rT

and0 ≤ P ′0 − P0 = C ′0 − C0 + (K ′ −K)e−rT .

15. Consider a portfolio which is long in a put with strike price K1, shortin a put with strike price K2 > K1, and long in a bond with face valueF := K2 −K1. The payo is

(K1 − ST )+ − (K2 − ST )+ + F =

0 if ST ≤ K1,

ST −K1 if K1 ≤ ST ≤ K2,

K2 −K1 if ST > K2

= (ST −K1)+ − (ST −K2)+,

which is the payo of a bull spread.

Chapter 5

1. The sample space consists of the permutations of 1, 2, and 3. F1

is generated by the sets (1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), and(3, 1, 2), (3, 2, 1); F2 = F3 contains all subsets of Ω.

3. By (v)

φn+1 = V0 −n∑j=0

Sj∆θj and φn = V0 −n−1∑j=0

Sj∆θj .

Subtracting yields ∆φn = −Sn∆θn.

5. If the portfolio is self-nancing, then ∆Vj−1 = φj∆Bj−1 + θj∆Sj−1

hence

Gn =

n∑j=1

∆Vj−1 = Vn − V0.

Conversely, if Vn = V0 +Gn for all n, then

∆Vn = ∆Gn =

n+1∑j=1

(φj∆Bj−1 + θj∆Sj−1)−n∑j=1

(φj∆Bj−1 + θj∆Sj−1)

= φn+1∆Bn + θn+1∆Sn

hence the portfolio is self-nancing.


Chapter 6

1. LetX1 be the of number red marbles drawn before the rst white one andX2 the number of reds between the rst two whites. Then Y = X1 +X2,and X1 + 1 and X2 + 1 are geometric with parameter p = w/(r + w)(Example 3.5.8). Therefore, EY = EX1 + EX2 = (2/p) − 2 = 2r/w(Example 6.1.5).

3. By Example 6.1.5, EX = 1/p. Also,

E[X(X − 1)] =

∞∑n=2

n(n− 1)qn−1p = pq

∞∑n=2

d2

dq2qn = pq

d2

dq2

q2

1− q =2q

p2.

Therefore,

VX = E[X(X − 1)] + EX − E2X = (2q + p− 1)/p2 = q/p2.

5. Let N = r + w. The number X of marbles drawn is either 2 or 3. IfX = 2, then the marbles are either both red or both white, hence

P(X = 2) =

r2 + w2

N2for (a)

r(r − 1) + w(w − 1)

N(N − 1)for (b).

The event X = 3 consists of the outcomes RWR, RWW, WRR, andWRW hence

P(X = 3) =

2r2w + 2rw2

N3for (a)

2r(r − 1)w + 2w(w − 1)r

N(N − 1)(N − 2)for (b).

Therefore, for case (a)

EX = 2 · r2 + w2

N2+ 6 · r

2w + rw2

N3

and for case (b)

EX = 2 · r(r − 1) + w(w − 1)

N(N − 1)+ 6 · r(r − 1)w + w(w − 1)r

N(N − 1)(N − 2).

7. For any A ∈ F ,

V IA = E I2A − E2IA = P(A)− P2(A) = P(A)P(A′)

hence the desired result follows from independence and Theorem 6.4.2.


9. Since fX,Y

(x, y) = I[0,1](x)I[0,1](y),

E(

4XY

X2 + Y 2 + 1

)=

∫ 1

0

∫ 1

0

4xy

x2 + y2 + 1dy dx

=

∫ 1

0

2x[ln (x2 + 2)− ln (x2 + 1)

]dx

=

∫ 3

2

lnu du−∫ 2

1

lnu du

= ln (27/16).

11. By linearity and independence,

E (X + Y )2 = EX2 + EY 2 + 2(EX)(EY ) = EX2 + EY 2

and

E (X+Y )3 = EX3+EY 3+3(EX2)(EY )+3(EX)(EY 2) = EX3+EY 3.

13. For (a), complete the square to obtain∫ b

a

eαxϕ(x) dx = eα2/2

∫ b

a

ϕ(x− α) dx = eα2/2 [Φ(b− α)− Φ(a− α)] .

For (b), integrate by parts and use (a) to obtain∫ b

a

eαxΦ(x) dx =1

αeαxΦ(x)

∣∣∣∣ba

− 1

α

∫ b

a

eαxϕ(x) dx

=1

α

(eαxΦ(x)− eα2/2Φ(x− α)

)∣∣∣∣ba

.

15. VX = EX2 − E2X, where EX = 12 (α+ β) (Example 6.2.2). Since

EX2 = (β − α)−1

∫ β

α

x2 dx = 13 (α2 + αβ + β2),

VX = 13 (α2 + αβ + β2)− 1

4 (α+ β)2 = 112 (α− β)2.

17. E2X = EX2 − VX ≤ EX2, since VX ≥ 0.

19. Referring to Example 3.2.5, the expectation is

X =

(N

z

)−1∑x

(m

x

)(n

z − x

)x,

where max(z−n, 1) ≤ x ≤ min(z,m). Show that this may be written as

m

(N

z

)−1∑x

(m− 1

x− 1

)(n

z − x

)= m

(N

z

)−1(N − 1

z − 1

),

where max(z − 1− n, 0) ≤ x− 1 ≤ min(z − 1,m− 1).


21. (a) By Equation (6.3),

P(Y = 50) ≈ Φ(.1)− Φ(−.1) ≈ 0.07966.

The exact probability is

(100

50

)2−100 = 0.07959.

(b) By Equation (6.2) with p = .5,

P(40 < Y < 60) = P(−2 <

2Y − 100

10< 2

)≈ Φ(2)− Φ(−2) ≈ .95.

Chapter 7

1. If A denotes the Cartesian product, then

P(A) =∑ω∈A

P1(ω1)P2(ω2) · · ·PN (ωN )

=∑ω1∈A1

· · ·∑

ωN∈AN

P1(ω1) · · ·PN (ωN )

= P1(A1)P2(A2) · · ·PN (AN ).

3. Part (a) follows from the denitions of Zn and Xn and (b) is a restate-ment of (7.2). For (c), use Sn(ω) = ωnSn−1(ω).

Part (c) implies that Xn is FSn measurable hence, by denition of FXn ,FXn ⊆ FSn . On the other hand, (7.3) implies that Sn is FXn measurablehence FSn ⊆ FXn .

5. If S0d < K < S0u, then

C0 = (1 + i)−1(S0u−K)p∗ =(1 + i− d)(S0u−K)

(1 + i)(u− d)

hence∂C0

∂u=

(1 + i− d)(K − S0d)

(1 + i)(u− d)2> 0

and∂C0

∂d=

(1 + i− u)(S0u−K)

(1 + i)(u− d)2< 0.

If d ≥ K/S0, then

C0 =(S0u−K)p∗ + (S0d−K)q∗

1 + i= S0 −

K

1 + i.


7. (a) call: $17.56; put: $19.98; (b) call: $19.70; put: $15.33.

9. Since k ≥ N/2, there can be only one k-run. Let A denote the eventthat there was a k-run of u's, and Aj the event that the run started attime j = 1, 2, . . . , N − k + 1, assuming that k < N − 1. The events Ajare mutually exclusive with union A, P(A1) = P(AN−k+1) = pkq, andP(Aj) = pkq2, j = 2, . . . , N−k. Therefore, P(A) = pkq(2+(N−k−1)q).The formula still holds if k = N − 1.

11. By Corollary 7.2.5 with f(x) = xI(K,∞)(x),

V0 = (1 + i)−NN∑j=m

(N

j

)S0u

jdN−jp∗jq∗N−j = S0

N∑j=m

(N

j

)pj qN−j .

13. Since

(SN − SM )+ =(S0u

YNdN−YN − S0uYMdM−YM

)+= S0u

YMdM−YM(uYN−YMdL−(YN−YM ) − 1

)+

,

Corollary 7.2.2 and independence imply that

(1 + i)−NV0 = E∗(SN − SM )+

= E∗(S0u

YMdM−YM)E∗(uYN−YMdL−(YN−YM ) − 1

)+

= E∗(SM )E∗(uYN−YMdL−(YN−YM ) − 1

)+

. (α)

By Remark 7.2.3(b),

E∗(SM ) = (1 + i)MS0. (β)

Since YN − YM = XM+1 + . . .+XN ∼ B(p∗, L),

E∗(uYN−YMdL−(YN−YM ) − 1

)+

= E∗(uYLdL−YL − 1

)+.

The last expression is (1+i)L times the cost of a call option with maturityL, strike price one unit, and initial stock value one unit. Therefore, bythe CRR formula,

E∗(uYN−YMdL−(YN−YM ) − 1

)+

= (1 + i)LΨ(k, L, p)−Ψ(k, L, p∗), (γ)

where k is the smallest nonnegative integer for which ukdL−k > 1. Thedesired expression for V0 now follows from (α), (β) and (γ).


15. By Corollary 7.2.5 with f(x) = x(x−K)+,

V0 = (1 + i)−NN∑j=m

(N

j

)S0u

jdN−j(S0ujdN−j −K)p∗jq∗N−j

=S2

0

(1 + i)N

N∑j=m

(N

j

)(u2p∗)j(d2q∗)N−j

− KS0

(1 + i)N

N∑j=m

(N

j

)(up∗)j(dq∗)N−j

= S20

(v

1 + i

)N N∑j=m

(N

j

)pj qN−j −KS0

N∑j=m

(N

j

)pj qN−j ,

where q = q∗d2/v and q = 1 − p. Since u2p∗ + d2q∗ = v, (p, q) is aprobability vector and the desired formula follows.

17. Use Exercise 3.12 and the law of the unconscious statistician.

19. By Exercise 17 with f(x, y) =(

12 (x+ y)−K

)+, m = 1, and n = N , we

have

(1 + i)NV0 =1

2(A0 +A1),

where

A0 :=

N−1∑k=0

(N − 1

k

)p∗kq∗N−k(S0d+ S0u

kdN−k − 2K)+ and

A1 :=

N∑k=1

(N − 1

k − 1

)p∗kq∗N−k(S0u+ S0u

kdN−k − 2K)+.

The hypothesis implies that S0d+S0ukdN−k > 2K for k = N −1 hence

there exists a smallest integer k1 ≥ 0 such that S0d+S0uk1dN−k1 > 2K.

Since S0u + S0uk+1dN−k−1 > S0d + S0u

N−1d for k = N − 1, thereexists a smallest integer k2 ≥ 0 such that S0u+S0u

k2+1dN−k2−1 > 2K.Therefore

A0 =

N−1∑k=k1

(N − 1

k

)p∗kq∗N−k(S0d+ S0u

kdN−k − 2K)

= (S0d− 2K)q∗N−1∑k=k1

(N − 1

k

)p∗kq∗N−1−k

+ S0dq∗N−1∑k=k1

(N − 1

k

)(up∗)k(dq∗)N−1−k

= (S0d− 2K)q∗Ψ(k1, N − 1, p∗) + (1 + i)NS0dq∗Ψ(k1, N − 1, p),


and

A1 = p∗N−1∑k=k2

(N − 1

k

)p∗kq∗N−1−k(S0u+ S0u

k+1dN−1−k − 2K)

= (S0u− 2K)p∗N−1∑k=k2

(N − 1

k

)p∗kq∗N−1−k

+ S0up∗N−1∑k=k2

(N − 1

k

)(up∗)k(dq∗)N−1−k

= (S0u− 2K)p∗Ψ(k2, N − 1, p∗) + (1 + i)NS0up∗Ψ(k2, N − 1, p).

Chapter 8

1. For any real number a,

E[g(X)IX=a

]=∑x

g(x)Ia(x)pX

(x) = g(a)pX

(a) and

E[Y IX=a

]=∑y

pX,Y

(a, y)y.

3. Since XY = X2 +X(Y −X), conditioning on G yields

E(XY ) = E[X2 +XE(Y −X|G)

]= E

[X2 +XE(Y −X)

]= EX2 + E(X)E(Y −X)

= EX2.

Therefore, E(Y −X)2 = EY 2 − 2EX2 + EX2 = EY 2 − EX2.

5. By the iterated conditioning property, if m > n

E(Mm|Fn) = E[E(Mm|Fm−1)|Fn] = E(Mm−1|Fn).

7. Since Mn+1 −Mn = X2n+1 + 2Xn+1Yn − σ2,

E(Mn+1 −Mn|FXn

)= EX2

n+1 + 2YnE(Xn+1|FXn

)− σ2 = 0.

9. Since Mn+1 = MnrXn+1 ,

E(Mn+1|FXn ) = MnE(rXn+1

)= Mn

(pq

p+ q

p

q

)= Mn.


11. Expanding (Am −An)2, we have for n ≤ m

E((Am −An)2|Fn

)= E

(A2m|Fn

)+ E

(A2n|Fn

)− 2E (AmAn|Fn)

= E(A2m|Fn

)+A2

n − 2AnE (Am|Fn)

= E(A2m|Fn

)+A2

n − 2A2n

= E (Bm|Fn) + E (Cm|Fn)−Bn − Cn= E (Cm − Cn|Fn) .

13. Condition on Fk.

Chapter 9

1. For (a) let

Ak = Sk ≤ (S0 + S1 + · · ·+ Sk−1)/k, k = 1, 2, . . . , N − 1.

Then Ak ∈ FSk and

τa = n = A1A2 · · ·An−1A′n ∈ FSn , n = 1, 2, . . . , N − 1, and

τa = N = A1A2 · · ·AN−1 ∈ FSN−1 ⊆ FSN .

3. Price: $20.91. Optimal exercise time scenarios: d ($27.00), ud ($35.91);uudd ($28.43).

5. We show by induction on k that

vk(Sk(ω)) = f(Sk(ω)) = 0 ()

for all k ≥ n (= τ0(ω)). By denition of τ0, (†) holds for k = n. Suppose(†) holds for arbitrary k ≥ n. Since

vk(Sk(ω)) = max(f(Sk(ω)), avk+1(Sk(ω)u) + bvk+1(Sk(ω)d)

)and all terms comprising the expression on the right of this equation arenonnegative, vk+1(Sk+1(ω)) = 0. Since

vk+1(Sk+1(ω))

= max(f(Sk+1(ω)), avk+2(Sk+1(ω)u) + bvk+2(Sk+1(ω)d)

),

f(Sk+1(ω)) = 0. Therefore, (†) holds for k + 1.


7. The proof of (a) is a straightforward modication of that of Corol-lary 9.1.3. To nd C0 take n = 0, m = N , and f(x) = (x − K)+ in(a). Then

C0 = a−NN∑j=0

(N

j

)p∗jq∗N−j

(bNujdN−jS0 −K

)+=

(b

a

)NS0

N∑j=m

(N

j

)p∗jq∗N−jujdN−j − K

aN

N∑j=m

(N

j

)p∗jq∗N−j .

Chapter 10

1. (a) x−2 dx = sin t dt ⇒ x−1 = cos t + c ⇒ x = (cos t + c)−1; x(0) =1/3⇒ c = 2. Therefore, x(t) = (cos t+ 2)−1, −∞ < t <∞.

(b) x(0) = 2⇒ c = −1/2⇒ x(t) = (cos t− 1/2)−1, −π/3 < t < π/3.

(c) 2x dx = (2t + cos t) dt ⇒ x2 = t2 + sin t + c. x(0) = 1 ⇒ c = 1 ⇒x(t) =

√t2 + sin t+ 1, valid for all t (positive root because x(0) > 0).

(d) (x+1)−1 dx = cot t dt⇒ ln |x+ 1| = ln | sin t|+c⇒ x+1 = ±ec sin t;x(π/6) = 1/2⇒ x+1 = ±3 sin t. Positive sign is chosen because x(π/6)+1 > 0. Therefore, x(t) = 3 sin t− 1.

3. Use the partitions Pn described in the example to construct Riemann-Stieltjes sums that do not converge.

5. For the rst assertion use the identity

W (s) +W (t) = W (t)−W (s) + 2W (s),

independence, and Example 3.6.2.

7. By Theorem 10.6.3, Xt =∫ t

0F (s) dW (s) has mean zero and variance

VXt =

∫ t

0

E(F 2(s)

)ds.

(a) E(sW 2

s

)= s2 hence VXt =

∫ t0s2 ds = t3/3.

(b) Since W (s) ∼ N(0, s),

E exp (2W 2s ) =

1√2πs

∫ ∞−∞

e2x2

e−x2/2s dx =

1√2πs

∫ ∞−∞

e−αx2/2 dx,

where α = s−1 − 4. If s ≥ 1/4, then α ≤ 0 and the integral diverges.


Therefore, VYt = +∞ for t ≥ 1/4. If s ≤ t < 1/4, then, making thesubstitution y =

√αx, we have

E exp (2W 2s ) =

1√2πsα

∫ ∞−∞

e−y2/2 dy =

1√sα

= (1− 4s)−1/2

so that

VXt =

∫ t

0

(1− 4s)−1/2 ds = 12 [1−

√1− 4t].

(c) For s > 0,

E |Ws| =2√2πs

∫ ∞0

xe−x2/2s dx =

√2s

π

hence

VXt =

√2

π

∫ t

0

√s ds =

2

3

√2

πt3/2.

9. (a) Use Version 1 with f(x) = ex.

(b) From Version 2, d(tW 2) = 2tW dW + (W 2 + t) dt.

(c) Use Version 4 with f(t, x, y) = x/y. Since ft = 0, fx = 1/y, fy =−x/y2, fxx = 0, fxy = −1/y2, and fyy = 2x/y3, we have

d

(X

Y

)=dX

Y− X

Y 2dY +

X

Y 3(dY )2 − 1

Y 2dX · dY.

Factoring out XY gives the desired result.

11. Taking expectations in (10.19) gives

EXt = e−βt(EX0 +

α

β(eβt − 1)

).

Chapter 11

1. (a) call: $4.65; put: $0.50; (b) call: $1.27; put: $2.98.

3.∂P

∂s=∂C

∂s−1 = Φ(d1)−1 < 0, lims→∞ P = 0, and lims→0+ P = Ke−rτ .

5. Taking f(z) = AI(K,∞)(z) in Theorem 11.3.2 yields

Vt = e−r(T−t)G(t, St),


where, as in the proof of Corollary 11.3.3,

G(t, s) =

∫ ∞−∞

AI(K,∞)

(s exp

σ√T − t y + (r − σ2/2)(T − t)

)ϕ(y) dy

= AΦ(d1(T − t, s,K, σ, r)

).

7. SinceVT = ST I(K1,∞)(ST )− ST I[K2,∞)(ST ),

V0 is the dierence in the prices of two asset-or-nothing options.

9. Vt = C(T − t, St, F ) + (F −K)e−r(T−t) and, in particular, V0 = C0 +(F − K)e−rT , where C0 is the cost of a call option on the stock withstrike price F . Therefore, K = F + erTC0.

11. ST > K i σWT +(µ−σ2/2)T > ln (K/S0) hence the desired probabilityis

1− Φ

(ln (K/S0)− (µ− σ2/2)T

σ√T

)= Φ

(ln (S0/K) + (µ− σ2/2)T

σ√T

).

13. The expression for EC follows from Theorem 11.4.1(i) and the Black-Scholes formula. To verify the limits, write

E−1C = 1− α Φ(d2)

sΦ(d1), α := Ke−rT

and note that (a) follows from lims→∞ Φ(d1,2) = 1. For (b), applyl'Hospital's Rule to obtain

α(1− E−1C )−1 = lim

s→0+

sΦ(d1)

Φ(d2)= lims→0+

sϕ(d1)(βs)−1 + Φ(d1)

ϕ(d2)(βs)−1

= lims→0+

sϕ(d1)

ϕ(d2)

[1 + β

Φ(d1)

ϕ(d1)

], β := σ

√T .

Since

d22 − d2

1 = (d2 − d1)(d2 + d1) = −β(d2 + d1) = 2[ln(K/s)− rT ],

sϕ(d1)

ϕ(d2)= s exp [ 1

2 (d22 − d2

1)] = s exp [ln(K/s)− rT ] = α

hence (1− E−1

C

)−1= α−1 lim

s→0+

sΦ(d1)

Φ(d2)= 1 + β lim

s→0+

Φ(d1)

ϕ(d1).


By l'Hospital's Rule,

lims→0+

Φ(d1)

ϕ(d1)= lims→0+

ϕ(d1)(βs)−1

ϕ(d1)(−d1)(βs)−1= − lim

s→0+

1

d1= 0.

Therefore, lims→0+

(1− E−1

C

)−1= 1, which implies (b).

15. Make the substitution

z = s expσ√T − t y + (r − 1

2σ2)(T − t)

.

Chapter 12

1. (a) E eλX =(peλ + q

)n.

(b) E eλX = peλ(1− qeλ

)−1.

3. For (a),

E(WsWt) = E[E(WsWt|FWs

)]= E

[WsE

(Wt|FWs

)]= E(W 2

s ) = s,

and for (b),

E(Wt−Ws|Ws) = E[E(Wt −Ws|FWs )|Ws)

]= E [E(Wt −Ws)|Ws)] = 0.

5. LetA = (u, v) | v ≤ y, u+ v ≤ x

and

f(x, y) =1√

s(t− s)ϕ

(x√t− s

)ϕ

(y√s

).

By independent increments,

P(Wt ≤ x,Ws ≤ y) = P((Wt −Ws,Ws) ∈ A

)=

∫∫A

f(u, v) du dv

=

∫ y

−∞

∫ x

−∞f(u− v, v) du dv.

7. M is a martingale i for all 0 ≤ s ≤ t, E(eα[W (t)−W (s)]|Fs

)= eh(s)−h(t).

By independence and Exercise 6.14,

E(eα[W (t)−W (s)]|Fs

)= E

(eα[W (t)−W (s)]

)= eα

2(t−s)/2.

Therefore, M is a martingale i h(t)− h(s) = α2(s− t)/2.


9. By Example 12.2.3 and iterated conditioning,

E(W 2t |Ws) = E[E(W 2

t −t|FWs )|Ws]+t = E(W 2s −s|Ws)+t = W 2

s +t−s.

Similarly, by Exercise 8,

E(W 3t − 3tWt|Ws) = E[E(W 3

t − 3tWt|FWs )|Ws]

= E(W 3s − 3sWs|Ws)

= W 3s − 3sWs.

Therefore, by Exercise 3,

E(W 3t |Ws) = 3tE(Wt|Ws) +W 3

s − 3sWs = W 3s + 3(t− s)Ws.

11. For any x,

P∗(X ≤ x) = E∗ I(−∞,x](X)

= e−12α

2TE(I(−∞,x](X)e−αWT

)= e−

12α

2TE(I(−∞,x](X))E(e−αWT

)= P(X ≤ x),

the last equality from Exercise 6.14.

Chapter 13

1. By Lemma 13.2.2, the call nishes in the money i

W ∗T > σ−1[ln (K/S0)− (r − 1

2σ2)T]

Therefore, the P∗-probability that the call nishes in the money is

1− Φ

(ln (K/S0)− (r − 1

2σ2)T

σ√T

)= Φ

(d2(T, S0,K, σ, r)

).

3. This follows from e−(r+σ2)tSt = S0eσW∗∗t − 1

2σ2t and Example 12.2.6.

Chapter 14

1. Parts (a) and (b) follow from the Ito-Doeblin formula applied to

f(t, x) = exp[σx+ (rd − re − σ2)t

]


andf(t, x) = exp

[−σx− (rd − re − σ2)t

],

respectively.

3. Use Equation (14.8), its analog for a put-on-call option, and the identity(K0 − C(s)

)+ − (C(s)−K0

)++ C(s) = K0.

5. As a rst step,

V0 = e−rT E[(ST −K)IAZ−1

T

]= e−(r+β2/2)T

[S0E

(eγWT IA

)−KE

(eβWT IA

)],

where E(eλWT IA

)is given by (14.19). An obvious modication of

(14.16) shows that

E(eλWT IA

)=

∫∫D

eλxgm(x, y) dA, D := (x, y) | b ≤ y ≤ 0, x ≥ y.

Since D has the same form as in Figure 14.2, the integral evaluates to(14.19) and hence, as in the text, leads to (14.9) and (14.10) withM = c.

7. From (14.11), the cost of the option is

Cdo0 = e−rdTE∗[(ST −K)IB ],

where S is given by (14.12) with r = rd − re. The calculations leadingto Equation (14.9) yield, as in the text,

Cdo0 = e−12 (2rd+β2−γ)TS0

[Φ(d1)− e2bγΦ(δ1)

]−Ke−rdT

[Φ(d2)− e2bβΦ(δ2)

].

Since 2rd + β2 − γ2 = 2re, the desired conclusion follows.

9. Since MW = −mU ,

P(WT ≤ x,MW ≤ y) = P(UT ≥ −x,mU ≥ −y)

=

∫ ∞−x

∫ ∞−y

fm(u, v) dv du.

Therefore, if −y < 0 and −y < −x,

fM (x, y) =∂2

∂x∂y

∫ ∞−x

∫ ∞−y

fm(u, v) dv du = gm(−x,−y),

and fM (x, y) = 0 otherwise. Since gm(−x,−y) = −gm(x, y), the formulafollows.


11. Since ST ≥ K,MS ≥ c = ST ≥ K, the price is that of a standardcall.

13. The rst assertion follows from limc→S−0δ1,2 = d1,2 and the second from

limc→0+ δ1,2 = −∞, the latter implying that limc→0+ Φ(δ1,2) = 0.

15. If mT/(n+ 1) ≤ t < (m+ 1)T/(n+ 1), then m ≤ t(n+ 1)/T < m+ 1,hence m = bt(n+ 1)/T c.

17. By (14.11) and (14.35),

Cdo0 = e−rTE∗[(ST −K)IB

],

where

St = S0eσ(W∗t +βt), β :=

r − δσ− σ

2.

With this change in β

Cdo0 = e−δTS0

Φ(d1)−(c

S0

) 2(r−δ)σ2

+1

Φ(δ1)

−Ke−rT

Φ(d2)−(c

S0

) 2(r−δ)σ2−1

Φ(δ2)

,where

d1,2 =ln (S0/M) + (r − δ ± σ2)T/2

σ√T

, and

δ1,2 =ln(c2/(S0M)

)+ (r − δ ± σ2)T/2

σ√T

.

19. The desired probability is 1− P(C), where

C := mS ≥ c = mW ≥ b, b := σ−1 ln (c/S0).

To nd P(C), recall that the measures P∗, P and the processes W ∗, Ware dened by

dP∗ = e−αWT− 12α

2T dP,

dP = e−βW∗T− 1

2β2T dP∗,

W ∗T = WT + αT, α =µ− rσ

, and

WT = W ∗T + βT = WT + (α+ β)T, β =r

σ− σ

2.


It follows that dP = UdP, where

U := eλWT− 12λ

2T , λ := α+ β =µ

σ− σ

2.

Therefore,

P(C) = E(ICU) = e−λ2T/2

∫∫D

eλxgm(x, y) dA,

whereD = (x, y) | b ≤ y ≤ 0, x ≥ y,

(see (14.16)). This is the region of integration described in Figure 14.2,so by (14.19),

P(C) =

Φ

(−b+ λT√T

)− e2bλΦ

(b+ λT√

T

).

Since±b+ λT√

T=± ln (c/S0) + (µ− σ2/2)T

σ√T

and

2bλ =

(2µ

σ2− 1

)ln (c/S0),

the desired probability is

1−

Φ(d1)−(c

S0

) 2µ

σ2−1

Φ(d2)

,

where

d1,2 =± ln (S0/c) + (µ− σ2/2)T

σ√T

.

21. The total payo is that of a portfolio consisting of a call option maturingat time T0 and n forward start options maturing at times T1, T2, . . . , Tn.The cost of the cliquet is then the sum of the costs of these options,which may be obtained by using the results of Section 14.2.


Bibliography

[1] Bellalah, M., 2009, Exotic Derivatives, World Scientic, London.

[2] Bingham, N. H. and R. Kiesel, 2004, Risk-Neutral Valuation, Springer,New York.

[3] Etheridge, A., 2002, A Course in Financial Calculus, Cambridge Univer-sity Press, Cambridge.

[4] Elliot, R. J. and P. E. Kopp, 2005, Mathematics of Financial Markets,Springer, New York.

[5] Grimmett, G. R. and D. R. Stirzaker, 1992, Probability and RandomProcesses, Oxford Science Publications, Oxford

[6] Hida, T., 1980, Brownian Motion, Springer, New York.

[7] Hull, J. C., 2000, Options, Futures, and Other Derivatives, Prentice-Hall,Englewood Clis, N.J.

[8] Karatzas, I. and S. Shreve, 1998, Methods of Mathematical Finance,Springer, New York.

[9] Kuo, H., 2006, Introduction to Stochastic Integration, Springer, New York.

[10] Kwok, Y., 2008, Mathematical Models of Financial Derivatives, Springer,New York.

[11] Lewis, M., 2010, The Big Short, W. W. Norton, New York.

[12] Musiela, M. and M. Rutowski, 1997, Mathematical Models in FinancialModelling, Springer, New York.

[13] Myneni, R., 1997, The pricing of the American option, Ann. Appl. Prob.2, 123.

[14] Ross, S. M., 2011, An Elementary Introduction to Mathematical Finance,Cambridge University Press, Cambridge.

[15] Rudin, W., 1976, Principles of Mathematical Analysis, McGraw-Hill, NewYork.

247


[16] Shreve, S. E., 2004, Stochastic Calculus for Finance I, Springer, NewYork.

[17] Shreve, S. E., 2004, Stochastic Calculus for Finance II, Springer, NewYork.

[18] Steele, J. M., 2001, Stochastic Calculus and Financial Applications,Springer, New York.

[19] Yeh, J., 1973, Stochastic Processes and the Wiener Integral, MarcelDekker, New York.

K14090

Option Valuation: A First Course in Financial Mathematics provides a straightforward introduction to the mathematics and models used in the valuation of financial derivatives. It examines the principles of option pricing in detail via standard binomial and stochastic calculus models. Developing the requisite mathematical background as needed, the text introduces probability theory and stochastic calculus at an undergraduate level.

The first nine chapters of the book describe option valuation techniques in discrete time, focusing on the binomial model. The author shows how the binomial model offers a practical method for pricing options using relatively elementary mathematical tools. The binomial model also enables a clear, concrete exposition of fundamental principles of finance, such as arbitrage and hedging, without the distraction of complex mathematical constructs. The remaining chapters illustrate the theory in continuous time, with an emphasis on the more mathematically sophisticated Black–Scholes–Merton model.

Largely self-contained, this classroom-tested text offers a sound introduction to applied probability through a mathematical finance perspective. Numerous examples and exercises help readers gain expertise with financial calculus methods and increase their general mathematical sophistication. The exercises range from routine applications to spreadsheet projects to the pricing of a variety of complex financial instruments. Hints and solutions to odd-numbered problems are given in an appendix.

Finance/Mathematics



Option Valuation

Option ValuationA First Course in Financial Mathematics

Hugo D. Junghenn

JunghennA First Course in

Financial Mathem

atics

K14090_Cover.indd 1 10/7/11 11:23 AM

Documents

Option Valuation a First Course in Financial Mathematics(1)