Empirical Finance

8/10/2019 Empirical Finance

1/562

Empirical Finance

Executive MSc in Investment and Risk Management Programme

Prof. Robert L [email protected]

+65 6631 8579

EDHEC Business School

2427 Mar 2011

2224 Aug 2011

Singapore Campus

Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 1 / 563


2/562

Introduction

Empirical Finance

Introduction


+65 6631 8579


2427 Mar 2011

2224 Aug 2011

Singapore Campus



3/562

Introduction Introduction

This course is about Empirical Finance.

What do the available data tell us about financial markets, and do theysupport or contradict the various theories we have developed to explain thebehaviour of financial markets?

We will focus mainly on pricing, that is, how prices of financial assets aredetermined. It is possible to focus on other aspects of financial markets,e.g., trading volume.

The course will discuss both econometric techniques, and the actual

empirical findings.



4/562

Basic Principles

Empirical Finance

Basic Principles


+65 6631 8579


2427 Mar 2011

2224 Aug 2011

Singapore Campus


B i P i i l P b bili d Di ib i


5/562

Basic Principles Probability and Distributions

Why is there even a subject matter called Empirical Finance?

1 Astronomers can predict the positions of the planets, and phenomenasuch as eclipses, with extreme accuracy, centuries in advance.

2 Meteorologists can predict the weather a few days in advance.3 Can stock market analysts predict stock prices ten minutes in

advance?

Humans have essentially no effect on the motion of the planets, and only(possibly) very long-term effect on the weather. Prices of financial assets

are set on a minute-to-minute basis by people.

How do they decide what the prices of financial assets should be?


B i P i i l P b bilit d Di t ib ti


6/562


The extent to which financial markets incorporate available information

into asset prices (the degree of market efficiency) is very hotly debated, inboth academic and industry circles.

There is no question, though, that events nobody knows about yet cantbe incorporated into asset prices.

The evolution of the macroeconomy, technological progress, societalevolution, are all very hard to predict, even by people who spend theirwhole lives studying such things. They are best modelled as randomprocesses.

If the fundamental economic processes that affect asset prices are random,then the asset prices themselves are also random.




7/562


The fact that security prices are random has profound implications forinvestorsmuch of financial theory involves the investors problem oftrading off risk and average return.

However, it also has profound implications for those who study financialmarkets. Financial theories are generally about relations between averagereturns and various measures of risk. If we observe that the averagereturns of securities differ from what is predicted by a theory, what

conclusion do we draw?

1 The theory is wrong.2 The theory is right, but its predictions are not met exactly because of

the random variation in asset prices.

Which is it?

Probability and statistics are absolutely fundamental to the study of

financial markets.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 7 / 563



8/562


Examplesuppose there are three assets, X, Y, and Z. We havedeveloped an economic theory that tells us what (on average) the returns

of the assets ought to be. We then get a sample of monthly returns(annualised) of the three assets, over the last 20 year period. The resultsare as follows.

AssetX Y Z

Average return (predicted) 8% 10% 12%

Average return (observed) 6% 16% 14%

Standard deviation of return (observed) 25% 40% 60%

How do the predictions of the theory hold up? Do you have enoughinformation to tell?




9/562


A probability distribution specifies the likelihood of each possible outcomeof a random process. They can be discreteorcontinuous.

When a random variable has a discrete probability distribution, there areeither finitely many outcomes, or countably many.

Consider a six-sided die, each side labelled with a number from one to six.If each side is equally likely to come up when the die is rolled, then the

probabilitiesp1, . . . , p6 are all equal to 1/6.

Probabilities (in a discrete probability distribution) must satisfy twoproperties:

1 The probabilities must be zero or positive.2 The probabilities must add up to one.

Do the probabilities specified above satisfy both of these constraints?




10/562


Probability Distribution of Six-sided Die Throw




11/562

p y

A discrete probability distribution can have infinitely many outcomes, eachwith positive probability.

Suppose we throw a coin with a heads and a tails side. The coin isfair, meaning each side has a probability of 1/2. Suppose we throw thiscoin repeatedly, and call Xthe number of throws until the first head.What is the probability distribution ofX?

There is a 1/2 probability that the first throw will be heads, sop1 = 1/2. The probability that the second throw will be the first head is1/4, so p2 = 1/4. More generally, pi= (1/2)

i. There is no limit to thevalue of i; it is possible (although not likely) that it will take a million, a

billion, a trillion trillion trillion, etc. throws.

Do these probabilities satisfy the two rules?




12/562

p y

Each of the probabilities is clearly greater than zero, so we have noproblem with negative probabilities. Do they add up to one?

i=1

pi=i=1

1

2

i= 1

(For justification of the last step, see any reference on geometric infiniteseries.)

The probabilities are non-negative, and up to onethey are validprobabilities. More generally, any distribution with

pi= (1 p)i1 p

for some p [0, 1] is called a geometricdistribution.




13/562

y

Probability Distribution of First Head in Coin Throw Example




14/562

Continuous probability distributions have uncountablyinfinitely many

possible outcomes.

Examplewhat is the amount of rainfall in the centre of Singapore on 22June 2011, measured in millimetres?

This quantity could take anynon-negative valueit could be zero (norainfall at all), or any positive number. (Since water consists of molecules,the amount of rainfall is actually a discrete quantityhowever, it is verywell approximated by a continuous distribution.)

Continuous probability distributions are specified by a probability densityfunction.




15/562

Examplethe random variable Xhas a uniform probability distribution onthe interval [0, 1]. Then Xhas the probability density function fX(x) = 1.

The density function does not specify the probability of each outcome;each particular outcome is infinitely improbable (i.e., has probability of 0).But ranges of outcomes have positive probability; what is the probabilitythat X falls in the interval [0.2, 0.3]?

P(0.2 X 0.3) = 0.30.2

fX(x) dx= 0.3

0.2(1) dx= x|0.30.2 = 0.1

Probability density functions must satisfy two rules:

1 They must be non-negative.2 They must integrate to one.

Does this uniform probability distribution satisfy these constraints?




16/562

The uniform probability density on [0, 1] is obviously positive on thisrange. It also integrates to one:

10

fX(x) dx= 1

0(1) dx= 1

Note that this integral is only taken over the range of possible values

[0, 1]. We can instead take the probability density to be defined as 0outside this range:

fX(x) = {1 0 x 10 x1We can then just integrate over the entire real line (, +), and thevalue of the integral is still one.




17/562

More generally, a uniform distribution can be defined on any range [a, b],with b>a:

fX(x) = { 1

(ba) a

x

b

0 xb

Note that the probability density satisfies the two requirements; it isnon-negative, and it integrates to one.




18/562

Uniform Distribution on [0, 1]




19/562

Another examplethe exponentialdistribution, with probability densityfunction defined on the interval [0, +

):

fX(x) =ex, >0

Note that this is not a single distribution, but a family of many

distributions, indexed by the parameter .

The exponential distribution has many applications; for example, it is usedto model the time until a radioactive particle decays. It is sometimes usedto model time to default in credit risk applications.

Does the exponential distribution satisfy the two requirements for a validprobability distribution?




20/562

Exponential Distribution with = 0.5




21/562

Another examplethenormal, orGaussian distribution. This distributionis defined for all real numbers (positive, zero, and negative), and has thedensity function:

fX(x) = 1

22e

(x)2

22 , >0

Despite its somewhat odd appearance, the normal distribution arises in avery natural way in many, many applications, and is one of the mostfundamental continuous distributions there is. It is often used to modelreturns of financial assets.

Note that the Gaussian distribution is actually a family of distributions,indexed by and . More on these parameters later.

Does the Gaussian distribution satisfy the two requirements for a validprobability distribution?




22/562

Gaussian Distribution with = 0.1 and = 0.25




23/562

We will often use summary statistics, which capture some (but not all) of

the information in the probability distribution of a random variable.One of the most important is the mean, or expected value. This is just theaverage outcome, weighted by probabilities.

E [X] =Ni=1

xipi

where xi is the value of a particular outcome, and pi is its probability. The

sum must be taken across all possible outcomes (the number of outcomesbeing denoted by N here).




24/562

For a random variable with a continuous distribution, the mean is anintegral over all possible outcomes (weighted by probability).

E [X] = +

xfX(x) dx

The expected values of the die and coin throw examples are 3.5 and 2,

respectively. The uniform distribution on [a, b] has an expected value of(a+b) /2. The exponential distribution has a mean of 1/. The normal(Gaussian) distribution has a mean of.

When there are infinitely many possible outcomes, the expected value may

not even existwhat is the expected value of a random variable that hasvalue 2 with probability 1/2, 4 with probability 1/4, etc.? The expectedvalue also does not even have to be one of the possible outcomesin thedie throw example, the mean is 3.5, but no throw ever has this value.




25/562

For a random variable X, any function g(X) ofX is also a randomvariable, and we can contemplate its expected value. For example, ifX isthe value of a die throw (1 through 6, with equal probability), what is the

expected value of the squaredoutcome?

From the definition of an expected value:

E X2= 6i=1

x2ipi =16(1)2 +. . .+1

6(6)2 =91

6

Similarly, E

X3

= 441/6 and E

X4

= 2275/6. (Try it.)

When there are infinitely many possible outcomes, the expected value ofXor a particular function ofXmay not exist. However, for the coin throwingexample, E [Xn] is well-defined for any integer n 0. Can you find E[X]and E

X2

?



W t j t b t th t d l ( t ) b t l


26/562

We care not just about the expected value (or average outcome), but alsohow large deviations from the average tend to be. The varianceof arandom variable is one such measure. For discrete and continuous randomvariables, respectively, the variance is:

Var [X] =N

i=1pi(xi E [X])2

Var [X] = +

fX(x) (x E [X])2dx

In both cases, we can express the variance as an expected value:

Var [X] = E

(X E [X])2

= E

X2 (E [X])2

The last step follows from the definitions of expected value and variance,

although the algebra is tedious.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 26 / 563


What is the ariance of X in the die thro e ample? One method go


27/562

What is the variance ofX in the die throw example? One methodgostraight to the definition of variance:

Var [X] =Ni=1

pi(xi E [X])2

=1

6(1 3.5)2 +. . .+1

6(6 3.5)2 =35

12

Another methodfind the variance in terms of quantities we have alreadycalculated:

Var [X] = E X2 (E [X])2 =916 7

22 =35

12

Both methods give the same answer, which is not a coincidence.

What is the variance in the coin throwing example?Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 27 / 563


Wh h i fi i l i (lik d l )


28/562

When there are infinitely many outcomes, variance (like expected value)may not exist. For example, a Students T distribution with 2 degrees offreedom has an expected value of 0, but its variance does not exist.

For most distributions we deal with, both mean and variance arewell-defined. For the exponential distribution, the variance is:

Var [X] = 1

2

(Can you prove it?)

For the normal (Gaussian) distribution, the variance is:

Var [X] =2

(Proof of this result is more difficult.)




29/562

Variance is, by construction, zero or positive. (It is only zero if the randomvariable is always equal to its mean.) It is never negative.

The mean, or expected value of a random variable can beexpressed in thesame units as the random variable itself; however, variance is not soconvenient. For example, suppose the annual return of a security has anormal distribution, with = 0.1 and = 0.4. Then the mean (oraverage) return is 0.1, or 10%, but its variance is 0.16; the units are

percent squared per year squared. We therefore will often use standarddeviation instead of variance:

SD [X]

Var [X]Standard deviation, like variance, is always zero or positive, but is in thesame units as the original random variable. In the example above, thestandard deviation of the securitys return is 40% per year.




30/562

In financial and economic applications, mean and variance are used all thetime. Less often, so-called higher ordermoments are used, e.g., the third

and fourth (centred) moments:

E (X E [X])3

= E X3

3 E X2

E [X] + 2 (E [X])3

E

(X E [X])4= E X4 4 E X3E [X]+ 6 E

X2

(E [X])2 3 ( E [X])4

Like variance, these quantities are not in the most convenient units, so theyare often converted to dimensionless quantities, skewness and kurtosis.




31/562

Skewness and kurtosis are defined as:

Skew E (X E [X])3(Var [X])

32

Kurt E (X E [X])4(Var [X])2

3

The kurtosis (sometimes called excess kurtosis) has 3 subtracted out to

make a normal distribution have a kurtosis of 0; any distribution withpositive kurtosis is therefore more kurtotic than a normal distribution.

Skewness is related to the symmetry of a distribution, and kurtosis isrelated to the probability of extreme values.

Skewness can take any value, positive or negative. Any symmetricdistribution (e.g., the normal distribution, the uniform distribution, or thedie throwing example) has skewness of zero.




32/562

A distribution that has most of the probability near the mean, but also has

a small amount of probability of extremely high values, then thedistribution will have positive skewness. If the extreme values are lowinstead of high, then the skewness will be negative.

Income distributions in most countries have positive skewnessmost

people earn an amount around the median, but a very small number ofpeople typically earn very high incomes.

The skewness of the exponential distribution is 2; the skewness of thedistribution in the coin throwing example is 3/

2. (Can you derive these

results?)




33/562

Kurtosis has to do with the probability of extreme observations. If a

random variable is almost always close to the mean, but with some smallprobability, it can take on a very large value (above or below the mean),then the distribution has high kurtosis.

The lowest possible value of kurtosis is2; there is no maximum value ofkurtosis. It is possible for the skewness and the kurtosis of a distributionnot to exist.

The exponential distribution has a kurtosis of 6; the uniform distributionhas a kurtosis of1.2. The Gaussian distribution has a kurtosis of zero.The coin throwing example has a kurtosis of 6.5, and the die throwingexample has a kurtosis of222/175. (Can you derive these results?)



Exponential vs. Gaussian Distribution


34/562

Exponential vs. Gaussian Distribution



Exponential vs. Gaussian DistributionRight Tail


35/562

Exponential vs. Gaussian Distribution Right Tail



Gaussian vs. Students T Distribution


36/562

Gaussian vs. Student s T Distribution


Basic Principles Estimation and Inference

Problem we do not know the distribution of random events


37/562

Problemwe do not know the distribution of random events.

1 For the coin throwing example, it seems like the probability of

heads is 0.5. Are you sure? Maybe it is a trick coin.2 For a security return, we know the future return is random (i.e., we

cannot predict it in advance with perfect accuracy). But what is itsprobability distribution?

If we have historical data (e.g., we have observed the coin being thrownrepeatedly, or we have historical returns for a security), we can use thisdata to learn something about the probabilities of different outcomes. (Isthere an implicit assumption here?)

Estimation of the entire probability distribution of a random variable is avery difficult problem. (Easy for some special cases, like the coin throwingexample.) We will focus on estimating quantities such as the mean andvariance of a random variable.




38/562

How do we estimate the mean (expected value) of a random variable, suchas the outcome of a coin throw, or the future return of a security?

An extremely general methodtake the sample averageof the availableobservations. Suppose we have observed N realisations of the randomvariableX, denoted by X1, . . . , XN. Then we can estimate the averagewith:

X = 1

N

Ni=1

Xi

Is this a good way to estimate the expected value of a random variable?




39/562

Exampleprobability of heads with a coin throw.

Call the value of a coin throw X= 1 if it comes up heads, and X = 0

otherwise. Call pthe probability of heads. Then:

E [X] =

2

i=1xipi =p 1 + (1 p) 0 =p

So estimating the expected value ofX is the same thing as estimating theprobability of heads. Estimate the sample mean by throwing the coin Ntimes, counting each heads as 1, and each tails as 0. Count up the

number of heads, and divide by N. This is X, the sample mean.

Will the sample average be equal to the true average (i.e., the expectedvalue)?




40/562

Exampleexpected return of a security.

Collect historical returns for the last Nmonths. Add them all up, and

divide by N:

R= 1

N

N

i=1Ri

This method is very commonly used to estimate expected returns ofbroadly diversified portfolios; it is used less often to try to estimate theexpected returns of individual securities. (Any idea why?)

Will the sample average return be equal to the true expected return?

What are the statistical properties of the sample mean?




41/562

First, we will need a few basic results. Let X and Y be random variables,and let a, b, and cbe constants. Then:

E [X+Y] = E [X] + E [Y]

E [aX] =a E [X]E [a+bX+cY] =a+bE [X] +cE [Y]

These results are true for both discrete and continuous random variables,

and follow directly from the definition of expected value. (The derivationis a little tedious though.)




42/562

The first two results are just special cases of the third, which can begeneralized; let X1, . . . , XNbe random variables, and let a0, . . . , aN beconstants. Then:

Ea0+ Ni=1

aiXi= a0+ Ni=1

aiE [Xi]

This last result will be extremely useful in analysing the statisticalproperties of the sample mean.



Note that the sample mean is itself a random variable; sometimes it will beh h h h d ll b l W fi d


43/562

higher than the true mean, and sometimes it will be lower. We can find itsexpected value, just like we can with any other random variable:

E

X

= E

1

N

Ni=1

Xi

= E

Ni=1

1

NXi

=Ni=1

1

NE [Xi]

=

Ni=1

1N

E [X] = E [X]

So the expected value of the sample average is equal to the true

averageif you estimate the true mean with the sample mean, then onaverage, you will get it right!

We would also like to examine how precise the estimate tends to behowmuch can the sample average deviate from the true average? However, we

need some additional tools first.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 43 / 563


L X d Y b d i bl Th j i di ib i ll h


44/562

Let X and Ybe random variables. The joint distribution tells us theprobabilities of different possible outcomes ofX and ofY individually, butit also tells us how X and Yare related. Suppose there are Mpossible

values ofX, and Npossible values ofY. Then the joint probability pi,j isthe probability that Xwill take the value xi, and Ywill simultaneouslytake the value yj.

The joint probabilities ofX and Ymust satisfy the same two restrictions

that all probabilities must satisfythey must be non-negative, and theymust add up to one.

We can also consider the probabilities of either X orY, considered alone.

For example, let p(X)

1

, . . . , p(X)

M

be the probabilities of theMpossible

values ofX, and let p(Y)1 , . . . , p(Y)N be the probabilities of theNpossible

values ofY. Then these two sets of probabilities are called the marginalprobabilities ofX and Y.



There is a relation between the marginal probabilities and the jointb biliti S ifi ll


45/562

probabilities. Specifically:

p(X)i =Nj=1

pi,j p(Y)j =

Mi=1

pi,j

SupposeX and Ycan each take on the values1, 0, or +1, and do sowith the following probabilities:

X1 0 +1

1 0.20 0.10 0.00Y 0 0.20 0.05 0.20+1 0.10 0.00 0.15

What are the marginal probabilities ofX and Y?Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 45 / 563


We can also specify the joint probability density function fX,Y(x, y) fort d i bl ith ti di t ib ti


46/562

two random variables with a continuous distribution.

The probability that X [a, b] and Y [c, d] is:

P (a X b, c Y d) = ba

dc

fX,Y(x, y) dydx

In either the discrete or the continuous case, expected values are definedanalogously to the case of a single random variable:

E [g(X, Y)] =

Mi=1

Nj=1

pi,jg(xi, yj)

E [g(X, Y)] =

+

+

fX,Y(x, y) g(x, y) dydx




47/562

We say the discrete random variables X and Y are independent if:

pi,j=p(X)i p

(Y)j

IfX and Yare continuous, then they are independent if:

fX,Y(x, y) =fX(x) fY(y)

Intuitively, X and Yare independent if knowledge ofX tells you nothing

about the probability of different outcomes ofY, and vice-versa.



We define the covariancebetween X and Y as:


48/562

Cov[X, Y] E [(X E [X]) (Y E [Y])] = E [XY] E [X] E [Y]

Covariance is a measure of how the two random variables are related; e.g.,if it is positive, then when X is above its mean value, Yalso tends to beabove its mean value.

If two random variables are independent, then their covariance is zero.(Proof?) However, it is possible for random variables to have a covarianceof zero, but not be independent.

Other useful properties of covariance are:

Cov[X, Y] = Cov [Y, X] Cov [X, X] = Var [X]

These follow immediately from the definition.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 48 / 563



49/562

The units of covariance are not particularly useful, so one may prefercorrelation:

Corr [X, Y] Cov[X, Y]SD [X] S D [Y]

Correlation is not well-defined if either X orYhas a standard deviation of

zero. But otherwise, correlation is dimensionless, and is bounded betweenits maximum value of +1 and its minimum value of1.Correlation and covariance have the same signthat is, they are bothpositive, both negative, or both zero.

If two random variables have a correlation of zero, we say they areuncorrelated. This does not necessarily mean that they are independent!



ExampleX and Yhave a bivariate normal distribution:


50/562

fX,Y(x, y) = 1

22X2Y(1 2) e

(x X)2

2Y

2 (x X) (y Y) XY+ (y Y)2 2X

2[2X2Y(12)]

This distribution has the following properties:

E [X] =X E [Y] =Y

Var [X] =2X Corr [X, Y] = Var [Y] =2Y




51/562

Note that, if= 0, then X and Yare independent. (Can you show it?)For this particular distribution, X and Yare independent if and only ifthey are uncorrelated.

This result does not generalise to other distributions! It is not true evenfor normal distributions; X and Ycan each have a marginal normaldistribution and a correlation of zero, but not be independent. (Can youconstruct an example?)



Two Standard Gaussian DistributionsZero Correlation


52/562



Two Standard Gaussian DistributionsCorrelation of+0.5


53/562



Two Standard Gaussian DistributionsCorrelation of0.5


54/562




55/562

The following properties of variance follow from the definition. (Can you

derive them?) Let X and Ybe random variables, and let a, b, and c beconstants. Then:

Var [X+Y] = Var [X] + Var [Y] + 2 Cov [X, Y]Var [aX] =a2 Var [X]

Var [a+bX+cY] =b2 Var [X] +c2 Var [Y] + 2bcCov [X, Y]

The first two are special cases of the third.




56/562

More generally, ifX1, . . . , XNare random variables and a0, . . . , aN are

constants:

Var a0+N

i=1 aiXi=N

i=1 a2i Var [Xi] + 2N

i=1N

j=i+1 aiajCov [Xi, Xj]The presence of the covariance terms has very profound implications forportfolio choice. What is the above result if the X1, . . . , XNare all

uncorrelated with each other?




57/562

At this point, it may be useful to specify some properties of covariances.

Let X, Y, U, and Vbe random variables, and let a, b, c, d, f, and g beconstants. then:

Cov[a+bX+cY, d+fU+gV] =bfCov[X, U] +bgCov [X, V]+cfCov[Y, U] +cgCov [Y, V]

For both variances and covariances, adding a constant to the arguments

has no effect.




58/562

The previous result may also provide some insight in why constants that

appear multiplicatively inside a variance must be squared when they aretaken outside:

Var [bX] = Cov [bX, bX] =b2 Cov[X, X] =b2 Var [X]

We will state and use a number of statistical results in this section and thenext without proof; if you want to fill in the proofs, the above property ofcovariance will often be useful. This result generalizes to arbitrary linearcombinations of random variables in the obvious way.



We can now further analyse the statistical properties of the sample mean.Specifically, we would like to find its variance. At this point, we assume


59/562

the X1, . . . , XNare independent of each other. (Is this a reasonableassumption?)

Var

X

= Var

1

N

Ni=1

Xi

=

1

N2

Ni=1

Var [Xi] = 1

NVar [X]

The standard deviation of the sample mean is:

SD

X

=

Var

X

=

1N

SD [X]

From the above results, we can reach the not very surprising conclusionthat, the more observations we have, the better an estimate of the truemean Xis. On average, it is right; furthermore, the more observations wehave, the less likely X is to deviate widely from the true mean.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 59 / 563



60/562

Examplecoin throwing.

Recall our method of estimating the probability a coin comes upheadsthrow the coin Ntimes, count the number of heads, and dividebyN. The resulting number (which is the sample mean) is an estimate ofthe probability of heads.

On average, the sample mean is an accurate estimate of the true mean.But if you throw a coin 1, 000 times, will it always come up heads 500times, even if it is a fair coin? Suppose it comes up heads 550 timesisthis evidence that it is a trick coin?

Recall that heads receives a value of 1, and tails receives a value of 0.The average value is p, where p is the probability of heads.



What is the variance of a single coin throw?


61/562

What is the variance of a single coin throw?

E

X2

=p(1)2 + (1 p)(0)2 =pVar [X] = E

X2 (E [X])2 =p p2 =p(1 p)

What is the variance of the sample average?

Var

X

=

1

NVar [X] =

p(1 p)N

We dont know the value ofp, so we dont know the variance of thesample mean.




62/562

However, note that p(1 p) takes a maximum value of 1/4 at p= 1/2.So we know for sure that:

Var X 1

4N SD X

1

2N

For N= 1, 000, we have E

X

= 0.5 and SD

X 0.01581



S f h b h d I h


63/562

Suppose after 1, 000 throws, we observe heads 550 times. Is the coinfair? The sample mean X is 0.55. If the coin is fair, then p= 0.5, and

E X= 0.5 and SD X 0.01581. There are two possibilities:1 The coin is not fair, and comes up heads more often than tails.2 The coin is fair, but came up heads more often than tails just

due to chance.

Which is it?

When data are generated by a random process, we can never know

anything with absolute certainty. However, we may be able to come to aconclusion with high probability.



We now construct a test statistic, of the form:


64/562

Z=X 0

where X is the sample mean (i.e., the mean estimated from the data), 0is the hypothesized mean (in this case, 0.5, since we are testing whetherthe coin is fair), and is the standard deviation of the quantity beingtested. Since 550 coins out of 1, 000 came up heads, X = 0.55, vs. thehypothesized value of0 = 0.5. We have calculated = 0.01581. So thetest statistic is:

Z =

X

0

=

0.55

0.50

0.01581 = 3.16

Intuitively, the observed outcome (550 heads) is 3.16 standard deviationsabove the mean outcome, if the coin were fair. Could this have happenedby chance?Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 64 / 563


Certainly 550 heads couldhave happened by chance; 600 heads, 900


65/562

y pp yheads, or 999 heads, or even 1, 000 heads could have happened by chance.But how likely is it? We can get some idea of how probable in outcome is,due to chance, even if the hypothesis being tested is true, using a resultknown as Chebyshevs inequality.

This result states that the a random variable takes values at least kstandard deviations away from the mean with a probability that is at 1/k2.For k 1, it tells us the probability is at most 1, but we knew thatalready, since nothing can happen with probability greater than one. Butfor two standard deviations, Chebyshevs inequality tells us that suchoutcomes can happen with probability ofat most1/4; depending on the

actual distribution, the true probability might be smaller. Outcomes threestandard deviations away from the mean happen with probability of atmost 1/9, etc.



In this case, the probability of getting a realised value ofX that is


66/562

, p y g gk= 3.16 standard deviations away from the mean is at most1/k2 = 0.10.So 550 heads could have occurred by chance, even if the coin is fair; butthe probability that the outcome would be 50 or more coin throws awayfrom the expected value of 500, is at most 0.10.

Are you willing to conclude that the coin is not fair, based on this test? Ifnot, how extreme would the outcome have to be in order to convince youthat the coin is not fair?

In fact, the actual probability of 550 heads, assuming the coin is fair, isquite a bit smaller than 0.10. The exact distribution of the outcome isknown in this case; it is called the binomialdistribution. However, thebinomial distribution is a bit unwieldy for large values ofN, so we willresort to an approximation.



Central Limit Theoremwhen the number of observations is large, thed b f h l X l l dl f


67/562

distribution of the sample mean X is approximately normal, regardless ofthe distribution ofX. (Requires existence of finite mean and variance.)

If a random variable has a normal distribution, then any linear function ofthat random variable also has a normal distribution. (Can you prove it?)The sample mean, X, has a normal distribution (approximately) by thecentral limit theorem. Recall the test statistic:

Z=X 0

The test statistic Z is a linear function ofX(note the other quantities inthe expression above are not random), and therefore also hasapproximately a normal distribution.



What are the mean and standard deviation of the test statistic Z?(Assume the hypothesis, that E [X] = 0.5, is true.)


68/562

E [Z] = E X 0

=

E X 0

=0 0

= 0

Var [Z] = Var

X 0

=

1

2Var

X 0

= 1

2Var

X

=

1

22 = 1

SD [Z] =Var [Z] = 1 = 1The test statistic tthus has approximately a normal distribution, withmean of 0 and variance of 1. (This is not a coincidencethe test statisticwas designed to have these properties.)

We can now use the test statistic to determine how likely an outcome of550 heads is, if the coin is fair.




69/562

Basic properties of a normal distribution:

1 The realised value is within one standard deviation of the mean withprobability 0.682.

2 The realised value is within two standard deviations of the mean withprobability 0.954.

3 The realised value is within three standard deviations of the meanwith probability 0.997.

These statistics are determined by integrating over the appropriate range

of the density function for the normal distribution.




70/562

For example, to find the second result, we can calculate:

Prob( 2 X + 2) = +22

122

e(x)2

22 dx

The integral above cannot be found in closed-form; however, it can beevaluated numerically. (A closed-form expression that is known to beaccurate to at least 15 decimal places does exist.)



Many books have tables of the value of integrals of the normal densityfunction for different ranges, and many software packages can also


71/562

g , y p gcalculate it. By any of these methods, we can determine than an

observations at least 3.16 standard deviations from the mean occur withprobability of only 0.00159.

In other words, if you were to throw a fair coin 1000 times, the combinedprobability that you would get either

1 550 heads or more2 450 heads or fewer

is only 0.00159, and the probability that the number of heads will fallbetween 450 and 550 is 0.99841. (These probabilities are based on anapproximation, that the sample mean has a normal distribution. Theapproximation is fairly accurate in this case.)



Coin Throw Example1,000,000 Trials, 1,000 Throws Each Trial


72/562



Coin Throw ExampleStandardised Distribution


73/562



Since the distribution ofX is approximately normal for a large number of


74/562

coin throws, the probability that the number of heads would differ from

the mean value by at least 50 is approximately 0.00159.The true value (based on the exact distribution ofX, which in thisexample is binomial) is 0.00173; the assumption of normality leads tosome inaccuracy, but not too much.

So, if the coin were fair, the expected number of heads would be 500, anda realised value as far away as 550 would occur with probability of lessthan 0.002; the probability that the number of heads would be closer to500 is more than 0.998.

Does 550 heads seem very likely to occur just by chance? Are you willingto declare that the coin is not fair?



(


75/562

Whether we use the approximate probability of 0.00159 (based on thenormal approximation) or the exact probability of 0.00173 (based on thebinomial distribution), this number has a nameit is often called thep-value. A p-value is simply the probability that, under the hypothesisbeing tested, data as extreme as what has been observed would occur justby chance. The p-value in this example is rather extremea result this

extreme (50 or more heads away from the expected value of 500) shouldoccur just by chance, if the coin were fair, fewer than two times out of athousand. If the coin were fair, we have just observed quite a remarkablecoincidence. It is possiblethe coin is fair; but it doesnt seem very likely.

We will now try to formalise this idea.



We have an hypothesisthe coin is fair, and the probability of heads is0.5.


76/562

We also have evidence550 heads out of 1, 000coin throws.

There are two types of errors we can make here:

1 Type I Errorwe rejectthe hypothesis (that is, conclude that thecoin is not fair) when it in fact is fair.

2 Type II Errorwe fail to reject the hypothesis (concluding the coin isfair) when it is in fact not fair.

It is impossible to avoid both types of errors completely. All we can do is

trade the probability of one off against the other.

The nearly universal convention in finance and economics (which iscompletely arbitrary) is to set the probability of a Type I Error at 0 .05.




77/562

Hypothesis: the coin is fair (the probability of heads is 0.5).

Evidence: 550 heads from 1, 000 coin throws.

If the hypothesis is true, the probability of getting a deviation from themean this large is only 0.00159 (using the normal approximationtheexact p-value is 0.00173).

Since this probability is less than 0.05, we rejectthe hypothesis, andconclude the coin is not fair.

Could we have just made a Type I error?



Yes, we could have just made a Type I error. The only way to avoid Type I


78/562

errors (incorrect rejection of an hypothesis that is true) is never to reject

any hypothesis. If one takes that approach, one is likely to commit quite alot of Type II errors (failure to reject an hypothesis which is false).

When the hypothesis is true, if we use a cut-off of 0.05 (as we did in thisexample), we are likely to reject the hypothesis (incorrectly) one time in

every twenty. If this risk of Type I error is unacceptably large, we can lowerour cut-off; for example, we could reject the hypothesis only if the p-valueis less than 0.02. Then we will only commit a Type I error one time inevery fifty, which is an improvement. However, this comes at a pricetheprobability of a Type II error goes up. We will fail to reject an hypothesis

that is false more often, if we decrease our cut-off value. There is no wayaround this trade-off.



One could take the approach of trying to assess how costly Type I and


79/562

One could take the approach of trying to assess how costly Type I andType II errors are, and changing the cut-off value accordingly. For

example, consider a medical test that is designed to detect the early stagesof a curable disease. If our hypothesis is the patient is healthy, then aType I error is a false positiveconcluding that the patient is sick, when infact the patient is healthy. A Type II error is a false negativefailure todetect the disease, when the patient in fact has it.

If the test is very sensitive, there will be very few false negatives (very fewType II errors), but there will also be a lot of false positives (lots of Type Ierrors). If the test is adjusted so that it is not so sensitive, then there willbe fewer false positives, but more false negatives. So how sensitive shouldwe make the test?



If we conclude that the cost of a Type II error is very high (a sick patient


80/562

yp y g ( pfails to get treatment, wrongly believing s/he is healthy),whereas the

Type I error is less costly (a healthy patient has some rather anxiousmoments, and undergoes some additional testing/treatment before it isrealised that there was a false positive), then we should make the test verysensitive. If the costs are different (for example, maybe the disease is notso serious, and the treatment is expensive, painful, and largely ineffective),

then we should make the test less sensitive.

This type of analysis is used frequently in some disciplines, such asengineering. It has largely gone out of fashion in financial analysis, wherearbitrary benchmarks (such as 0.05 probability of a Type I error) are

commonplace.


Basic Principles Testing Pricing Models

Returning to the three securities mentioned earlier:


81/562

AssetX Y Z

Average return (predicted) 8% 10% 12%

Average return (observed) 6% 16% 14%

Standard deviation of return (observed) 25% 40% 60%

Recall that the observed quantities were estimated from 20 years ofmonthly returns data. Can we safely conclude that the securities do notconform to the predictions of the theory?

This problem is much more difficult than the coin throwing example.

Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 81 / 563 Basic Principles Testing Pricing Models

Assume the predictions of the model are correctthen the deviations ofthe observed average returns from the predicted average returns are justdue to the random variation of the data. We already know:


82/562

E X= 8% E Y= 10% E Z= 12%But we need to know the standard deviations as well:

SD

X

=? SD

Y

=? SD

Z

=?

There were 20 years of monthly data, so N= 240, and

240 15.49.Therefore:

SD

X

=SD [X]

15.49 SD

Y

=SD [Y]

15.49 SD

Z

=SD [Z]

15.49


The problem is that we do not know the standard deviations ofX, Y, andZ; we can only estimate them from the data. Estimates were included inthe table, but how these were determined was not specified.


83/562

The usual way of estimating the variance of a random variable (which canthen be used to estimate the variance of the sample average) is as follows:

s2XX = 1

N 1

N

i=1 (Xi X2

Note that, in order to calculate s2XX, we must first calculateX. The

presence of the N 1 (instead ofN) in the denominator may seempuzzling; this is a correction to account for the fact that the mean is notknown exactly, but must be estimated with X.

The sample variance s2XXis itself a random variablewhat are itsstatistical properties?


We have all the tools we need to find its mean and variance, although thealgebra can be tedious.


84/562

E s2XX= E 1N 1Ni=1

(Xi X2=

1

N

1

N

i=1 (E

X2i

2 E

XiX+ E

X2

=

1

N 1Ni=1

Var [X] + E [X]2

2N

Var [X] 2 E [X]2

+

1

NVar [X] + E [X]2

=Var[X]

Can you fill in the missing steps?


The following results can also be derived, with considerable difficulty:


85/562

Var

s2XX

= (SD [X])4 2

N 1+Kurt[X]

N

Cov

X, s2XX

=

Skew [X] (SD[X])3N

IfXhappens to have a normal distribution, then its skewness and kurtosisare each equal to zero, the sample mean and variance are uncorrelatedwith each other, and the variance ofs2XXhas a very simple form.

We will not prove these results, but ifXhas a normal distribution, then Xalso has a normal distribution, and s2XXhas a chi-square distribution.


Returning to the example, consider security X. We have a theory thatpredicts its expected return is 8%, but when we estimate the mean withX , it is 6%. The estimated standard deviation (we will use the notation


86/562

X, it is 6%. The estimated standard deviation (we will use the notationsX) is 25%.

We would like to construct a test statistic:

Z =X 0SD X =

N X 0SD [X]

If the hypothesis is correct, then the expected value ofX is 6% and itsstandard deviation is SD [X] /

240 (recall that there are 240 monthly

observations). The test statistic then has a mean of zero, and a standarddeviation of one. IfXhas a normal distribution, then Zalso has a normaldistribution; even ifX isnt normal, then by the central limit theorem, Z isapproximately normal for large N.


Z-statistic for Stock Return Example1,000,000 Trials


87/562


The test statistic Z is therefore ideal, except for one little problemit isinfeasible. We dont know SD [X ], and can only estimate it. Note that this


88/562

infeasible. We don t know SD [X], and can only estimate it. Note that thissituation is different from the coin throwing examplethere, under thehypothesis (that the coin is fair, and the probability of heads is 1/2), weknew the standard deviation of a coin throw. Here, we dontthehypothesis tells us what the value of the mean ought to be, but is silentwith respect to the variance and standard deviation.

Instead, we must use the estimated standard deviation, rather than theactual, to form our test statistic:

t= N X 0sX


Because the standard deviation used in our test statistic is estimated, thedistribution of the test statistic is not normal, even if X is. Under the


89/562

distribution of the test statistic is not normal, even ifX is. Under theassumption of normality for X, the test statistic thas a Students tdistribution with N 1 degrees of freedom.The t-distribution approaches a standard normal distribution (i.e., anormal distribution with a mean of zero and a standard deviation of one)as the degrees of freedom become large. When there are many dataobserved, the uncertainty in the estimate of the mean remains much largerthan the uncertainty in the estimate of the standard deviation, and the tstatistic approaches the distribution it would have if the standard deviationwere known with certainty: a standard normal. When the number of data

observations is small, though, the deviation from normality can be verysignificant.


T Distribution with Various Degrees of Freedom


90/562


T-statistic for Stock Return Example1,000,000 Trials


91/562


T-statistic with Non-Gaussian Returns1,000,000 Trials, T= 240


92/562

Kimmel (EDHEC Business School) Empirical Finance Singapore Mar/Aug 2011 92 / 563 Basic Principles Testing Pricing Models



93/562




94/562


The test statistic for security X is then:

t

NX 0

2406% 8%

1 24


95/562

t=

N0

sX =

240 25% 1.24Since the number of degrees of freedom is quite large, we can simply treatthe t-statistic as if it were normally distributed. A test statistic of1.24corresponds to a p-value of approximately 0.215; that is, if the hypothesis

were true, there is still a probability of 0.215 that the sample averagereturn of the security would differ from the hypothesized value by at least2%.

If we use the 0.05 cut-off for p-values, as is common practice in finance,

we cannot reject the hypothesis that E [X] = 8%. The risk that we aremaking a Type I error is too high.

Do the other securities provide evidence against the model?

Kimmel (EDHEC Business School) Empirical Finance Singapore Mar/Aug 2011 95 / 563


96/562

Basic Principles Multivariate Tests

Is there anything wrong with what we are doing here?


97/562

It doesnt make any sense to test the securities one at a time. Suppose themodel we are testing is actually trueit correctly describes the expectedreturns of all securities. If we go out and test its predictions one securityat a time, then for each test we conduct, there is a 0.05 probability(assuming 95% confidence) of a Type I error. If, for example, we test a

model for Japanese stock returns, and decide to conduct a statistical testfor each of the 225 stocks in the Nikkei 225 index, that is 225 chances tohave a Type I error. How likely is it that at least some of the stocks willappear to violate the predictions of the model, just by chance, even though

the model is true?

Ki l (EDHEC B si ss S h l) E i i l Fi Si M /A 2011 97 / 563 Basic Principles Multivariate Tests

What we really ought to do is perform a single statistical test of all thesecurities simultaneously. For example, we could consider a test statisticalong the lines of the following:


98/562

g g

F =t2X+t2Y +t

2Z =

(RX 0,X

22

RX

+

(RY 0,Y

22

RY

+

(RZ 0,Z

22

RZ

Intuitively, this statistic has some advantagesit is big when thet-statistics for the individual assets are big, it places more weight onviolations of the theorys predictions for assets which have small standard

deviations, etc. It also seems like it has a distribution that can becalculatedit is the sum of three squared t distributions. But are these tdistributions independent?

Ki l (EDHEC B i S h l) E i i l Fi Si M /A 2011 98 / 563 Basic Principles Multivariate Tests

The test statistic just proposed doesnt work if we cant be sure that thereturns of the three assets are independent (or at least uncorrelated). Wecan fix this defect, but first, we will need to be able to estimatecovariances from historical data The usual way of estimating the


99/562

covariances from historical data. The usual wayof estimating the

covariance between X and Y is:

s2XY = 1

T

1

T

t=1 (XtX

(Yt Y

This estimator is unbiased, i.e., E

s2XY

= Cov [X, Y]. Derivation of its

variance (and covariance with other statistics) is very difficult.

The T 1 divisor, instead ofT, is often a point of confusion. T 1 isused to make our estimate unbiased. Some just use T, but if you estimatecovariance (or variance) this way, then your estimate is biased; it tends tobe a little too small, on average. For large T, it doesnt matter very much.


Some software products are quite inconsistent about which divisor theyuse, T 1 orT. For example, a spreadsheet product produced by asoftware company based in Redmond, Washington, USA, usesT 1 in theVAR function but T in the COVAR function Therefore even though


100/562

VAR function, but T in the COVAR function. Therefore, even though

Cov[X, X] = Var [X] by definition, this software package returns differentvalues for VAR(A1:A10) and COVAR(A1:A10,A1:A10). When youhave a piece of software do these sorts of calculations for you, make sure itis doing what you think it is doing.

When we need to estimate a correlation from historical data, we will do soas follows:

= s2XY

sXsY

The little hat over the indicates that the quantity is the estimated,rather than true correlation.


We now return to the problem of constructing a joint test statistic. Forconvenience, we will call the assets X1, . . . , XN. It is convenient to arrangethe means of the assets in a column vector, and the variances andcovariances in a matrix:


101/562

=

E [X1]...

E [XN]

=

Var [X1] Cov[X1, XN]...

. . . ...

Cov[XN, X1] Var [XN]

The sample equivalents are:

= X1

..

.XN =

s211 s21N..

.

. . . ..

.s2N1 s2NNwhere, through a slight abuse of previous notation, sij is the samplecovariance ofXi and Xj.


We will need three linear algebra operations to construct a reasonable teststatistic: matrix multiplication, matrix transposition, and matrix inversion.


102/562

In case these operations are not familiar, we will start with multiplicationof a row vector by a column vector. To perform this operation, we justmultiply each element in one of the vectors by its corresponding element inthe other vector, and add the products all up:

x1 xN

y1...yN

= Ni=1

xiyi

The number of elements in the two vectors must be the same; otherwisethe product is undefined.


More generally, we can find the product of any two matrices, provided thenumber of columns in the first matrix is equal to the number of rows inthe second matrix. The product of a KMmatrix and an M N matrixis a K Nmatrix. The element in row iand column jof the product is


103/562

row iof the first matrix multiplied by column jof the second matrix:

x11 x1M

..

.

. ..

...

xK1 xKM y11 y1N

..

.

. ..

...

yM1 yMN=

Mi=1x1iyi1

Mi=1x1iyiN

... . . .

...Mi=1xKiyi1 Mi=1xKiyiN

The inner dimensions of the two matrices must match, or the product isundefined.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 103 / 563


Many of the rules of ordinary multiplication do not apply to matrixmultiplication; for example, matrix multiplication is not commutative.


104/562

A numeric example of matrix multiplication:

3 5 -24 1 0

6 1-8 4

2 1= -26 2116 8

Given the large number of operations involved, it is not a bad idea to havea computer available before multiplying even relatively modestly sizedmatrices together. For example, to multiply a 5

8 matrix by an 8

3

matrix requires 120 multiplications and 105 additions.



Transpose is a very simple operation, usually denoted by either a T or aprime superscript, i.e., CT orC. The matrix is flipped around, so that therows become columns and the columns become rows:


105/562

x11 x1N... . . . ...xM1 xMN

T

=

x11 xM1... . . . ...x1N xMN

A numeric example:

1 3 -2-8 0 4T

= 1 -83 0

-2 4It doesnt get much easier than matrix transposition.



Matrix operations can be used to avoid cumbersome algebraic expressionsinvolving large numbers of assets. For example, consider Nassets, withreturns R1, . . . , RN, and a portfolio with share a1 invested in the firstasset, a2 invested in the second asset, and so on, up to aN invested inasset N (The weights a should add up to one ) What is the variance of


106/562

asset N. (The weights ai should add up to one.) What is the variance ofthe return of this portfolio?

Var [a1R1+. . .+aNRN] =N

i=1N

j=1 aiajCov [Ri, Rj]Arranging the a1, . . . , aN in a column vector a, the returns R1, . . . , RN in acolumn vector R, and the variances and covariances of returns in a matrix, we can express the above as:

Var

aTR

= aTa

(Try it!) This expression is valid for any number of assets.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 106 / 563


Numeric examplesuppose the returns of three assets have the covariancematrix:


107/562

=0.040 0.012 0.0200.012 0.090 0.036

0.020 0.036 0.160

What is the variance of the return of a portfolio that is 0.2 invested in thefirst asset, 0.6 in the second asset, and 0.1 invested in the third asset?

0.2

0.60.1T

0.040 0.012 0.020

0.012 0.090 0.0360.020 0.036 0.1600.2

0.60.1= 0.0436



Matrix inversion, usually denoted by a 1 superscript, as in C1, is arather difficult operation. The inverse of a matrix satisfies the condition:

C C1 = C1 C = I


108/562

C

C C

C I

where I is the identity matrix, which has 1 for each element on thediagonal, and 0 everywhere else:

I =

1 0 0...

. . . ...

...0 1 0...

... . . .

...

0 0 1

If a matrix is not square (i.e., same number of rows and columns), it doesnot have an inverse.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 108 / 563


Square matrices may or may not have inverses, although covariancesmatrices usually do. Specifically, every matrix that is the covariancematrix of some set of random variables R is automatically positivesemidefinite:


109/562

Var

aTR

=aTa 0 a

Such a matrix is also positive definiteif it satisfies the stronger condition:

Var

aTR

= aTa>0 a = 0

A covariance matrix has an inverse if and only if it is positive definite.That is, if the only portfolio of assets that is risk-free (i.e., has variance ofzero) is the portfolio with weight zero on every asset, then the covariancematrix of the asset returns is positive definite.



Numeric examplesmatrix inversion is actually rather easy for diagonalmatrices, i.e., those in which the off-diagonal elements are all zero:

5 0 01

0.2 0.0 0.0


110/562

0 2 00 0 1

= 0.0 0.5 0.00.0 0.0 1.0

Note that the inverse is also diagonal, and the elements are just thereciprocals of the elements in the original matrix.

Things are a bit more complicated in general:

3 6 14 7 -26 13 0

1

= 1.6250 0.8125 -1.1875-0.7500 -0.3750 0.62500.6250 -0.1875 -0.1875

(Try verifying the inverses.)Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 110 / 563


Recall the example of the three securities, which were used to test a modelof expected returns. We have no information on the covariances betweenthe three asset returns; suppose these are all estimated at exactly zero(not very likely, but assume so for purposes of the discussion). We can


111/562

arrange the sample mean returns in a vector, and the hypothesized meanreturns in another vector:

= 6%

16%14% 0 =

8%10%12%

The estimated variances and covariances can be arranged in a matrix:

=0.0625 0 00 0.16 0

0 0 0.36



The proposed joint test statistic can be expressed as:

T 1


112/562

F = ( 0)T

1

( 0)At an intuitive level, this test statistic has some good properties. Whenany of the assets have an estimated expected return that is far from thehypothesized value, this tends to make the test statistic large.Furthermore, it gives more weight to assets whose mean is estimated moreaccurately. If an asset has a small (estimated) variance of return, when is inverted, the corresponding element is large, giving more weight to thedeviation of this assets average return from the hypothesized value. Assets

with large variance of return require larger differences between the observedand hypothesized returns to have the same effect on the test statistic.



This test statistic works just as well when the asset returns arecorrelated;the only modification we will make is to add a scaling factor:


113/562

F =T(T N)

N(T 1) ( 0)T 1 ( 0)

where T is (as before) the number of observations, and N is the number

of assets. Under an assumption of normality (the asset returns have themultivariate normal distribution), this test statistic has an F distribution.

An F distribution has two degrees of freedom parameters; the first is N,and the second is T N. This is sometimes written FN,TN. Tables ofthe Fdistribution are widely available in statistics books and otherreferences; many software packages can calculate them.



F-statistic for Stock Return Example1,000,000 Trials


114/562



When T is very large, the assumption of multivariate normality is not


115/562

When T is very large, the assumption of multivariate normality is notparticularly important. Recall that, for our application, the first degrees offreedom parameter is N, and the second is T N. The Fd1,d2 distributionapproaches a chi-squaredistribution with d1 degrees of freedom as d2approaches +; sinced2 approaches +asd2 becomes very large, this isthe limiting distribution of the test statistic for very large T. However, thetest statistic approaches this distribution, for very large T, even if the dataare not multivariate normally distributed.



Chi-square Distribution with Various Degrees of Freedom


116/562



F Distribution and Limiting Chi-square Distribution


117/562



A test procedure is therefore:

1 Estimate the sample means, sample variances, and sample covariances


118/562

of the asset returns from historical data.2 Arrange the sample means into a vector, and the sample variances

and covariances into a matrix.3 Also arrange the hypothesized values of the mean returns into a

vector.4 Calculate the test statistic F.5 Determine the p-value of this statistic, using tables from a book,

software, or some other source.6

If the p-value is small enough (e.g., smaller than 0.05 for a 95%confidence test), then rejectthe hypothesis that the model is correct.



Numeric examplesuppose the (estimated) covariance matrix for thethree assets is:


119/562

=

0.0625 -0.0200 0.0300

-0.0200 0.1600 0.02400.0300 0.0240 0.3600

(Are these numbers consistent with the standard deviations reportedearlier?)

Can we reject, with 95% confidence, the predictions of the model?



The test statistic is:

F =240 (240 3)


120/562

3(240 1) 6% 8%16% 10%

14% 12%

T 0.0625 -0.0200 0.0300-0.0200 0.1600 0.02400.0300 0.0240 0.3600

1 6% 8%16% 10%14% 12%

2.066

This distribution has 3 and 237 degrees of freedom. Many tables for the Fdistribution do not actually show p-values for different values of the Fstatistic, but rather a single cut-off p-value for tests of different confidence

levels. From a table for 95% confidence tests, we find that the cut-offvalue for an F distribution with 3 and 120 degrees of freedom is 2.6802,and for 3 and infinitely many degrees of freedom, it is 2.6049. For 3 and237 degrees of freedom, it must be somewhere in between.Kimmel (EDHEC Business School) Empirical Finance SingaporeMar/Aug 2011 120 / 563


If the F-statistic is above the cut-off value of approximately 3, then thep-value is below 0.05, and we can reject the hypothesis (correctness of themodel) with 95% confidence. If the F-statistic is below thecut-off value ofapproximately 3, then the p-value is above 0.05, and we cannot reject the


121/562

hypothesis. (Recall that this does not mean the hypothesis is true; itmeans we have not found sufficient evidence to conclude that thehypothesis is false.)

The F-statistic is 2.066, which is well below the cut-off value, so wecannot reject the hypothesis with 95% confidence. (We cannot reject itwith 90% confidence eitherthe p-value is 0.1053.)

So despite the fact that a t-test rejects the hypothesis for one of the assetsindividually, a joint test based on an F-statistic fails to reject thehypothesis. We have not seen enough evidence to convince us, with 95%confidence, that the model is false.



It is worthwhile in a discussion of hypothesis testing to warn against thedangers of data mining.


122/562

In some disciplines, data mining is considered a good thing; one can eventake a course to learn how to do it. In finance and economics, if someonetells you that you are data mining, that person is not paying you acompliment.

What is data mining? Recall that, even if an hypothesis is true, there is acertain probability of committing a Type I error (rejecting the hypothesiseven when it is true). For example, suppose you believe that the level ofthe high tide has an effect on stock market returns. The reality is thatyour theory is wrong, and the tides have no effect on the stock market;

however, you dont know this.



So, you gather some data on the tides and the stock market, and performa statistic test of your hypothesis. Following common practice, you reject


123/562

the hypothesis the tides have no effect on the stock market if thep-value of your statistical test is 0.05 or less. There is then a one in twentychance that you will reject the hypothesis, and conclude that the tides dohave an effect on the stock market (even though they dont).

Data mining refers to the practice of performing statistic test afterstatistical test, until finding one that rejects, and then reporting onlythelast test. This is a recipe for finding spurious resultschances are goodthat the result you report will be a Type I error, rather than a legitimateresult.



The pressure to find results is enormous, both in academicand industry

l l fi d l bl d d


124/562

circles. Failure to find a result may mean no publication in academics, andno clients in industry. The incentives to engage in data mining are huge,and many engage in it, either fully aware of what they are doing, or havingsuccessfully deluded themselves into believing that what they are doing islegitimate.

A rule of thumb is the following: if you cant think of a reasonableeconomic story for the statistical result you have found, that should be awarning sign that the result is the product of data mining.


Testing the CAPM

Empirical Finance

Testing the CAPM


125/562


+65 6631 8579


2427 Mar 20112224 Aug 2011

Singapore Campus


Testing the CAPM Conditional Probabilities

We need to look at the relation between the returns of multiple securities;the notion of conditional probabilities is absolutely centralto the analysis.

Th b bili f lik l d d h h i f i


126/562

The probability of an event very likely depends on how much informationone has. For example, it is much easier to forecast the value of a stock (orthe weather, or an election) one day in advance than it is three years inadvance. The reason is, over the past three years, a great deal hashappened that affects the value of the stock (or the weather, or theoutcome of the election). However, if you are making your forecast one dayin advance, then you know almost everything that will affect the variableyou are forecasting during the last three years; the only information you aremissing pertains to the one remaining day. If you are making your forecast

three years in advance, you are doing so with much less information.


Testing the CAPM Conditional Probabilities

Probabilities therefore depend on an informationset; people with different

i f ti h diff t b biliti f th t I


127/562

information have different probabilities for the same event. In somecontexts, the idea of the information set is left implicit; however, we willsometimes need to make it explicit.

We will often deal with the situation of two distinct information sets, with

one being a strict subset of the other. Probabilities based on the moreinformative information set are then called conditionalprobabilities, andthose based on the less informative information set are calledunconditionalormarginalprobabili

Documents

Empirical Finance